Skip to content

Conversation

@tarilabs
Copy link

@tarilabs tarilabs commented Nov 19, 2025

Summary by Sourcery

Add support for uploading LMEvalJob evaluation results to OCI registries, including CRD/API extensions, controller wiring, driver integration, validation, and tests.

New Features:

  • Allow LMEvalJob outputs to be stored as OCI artifacts via a new OCISpec configuration on Outputs.
  • Expose a driver flag and behavior to upload evaluation results to an OCI registry when configured.

Enhancements:

  • Configure pods with OCI-related environment variables, secret-backed auth, optional docker config volumes, and SSL verification settings when OCI output is enabled.
  • Improve controller logging around configuration construction and DSC application.
  • Refine PVC mounting logic to only apply when PVC-based outputs are used.

Build:

  • Print params.env from base and overlay configs in the CI build-and-push workflow for debugging.

Documentation:

  • Extend the LMEvalJob CRD schema documentation to describe the new OCI outputs configuration fields.

Tests:

  • Add unit tests covering OCI command-line flag generation, pod configuration (env, volumes, auth variants, tag and subject handling), OCI helper methods, OCI path and auth validation, end-to-end driver OCI upload behavior, and validation of OCI-related user input.

Summary by CodeRabbit

  • New Features

    • OCI registry output target for evaluation results (registry/repository/tag/subject and SSL toggle)
    • Configurable OCI authentication via username/password or Docker config
    • CLI flag to enable uploading results to OCI
  • Validation

    • Path and authentication validation for OCI outputs to prevent unsafe or invalid inputs
  • Tests

    • Extensive tests for OCI command generation, pod configuration, and validation
  • Chores

    • CI now prints parameter files for debugging; build image setup updated (runtime dirs, tooling, env)

✏️ Tip: You can customize this high-level summary in your review settings.

@tarilabs tarilabs added the needs-lmes-build Pull requests that need a new build of the LMES driver and job images for CI label Nov 19, 2025
@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 19, 2025

Reviewer's Guide

Adds first-class OCI registry upload support for LMEvalJob outputs by introducing an OCISpec CRD/API, wiring it through pod/env/volume construction and driver CLI behavior, and validating configuration plus upload flow with unit tests.

Sequence diagram for OCI upload flow after LMEvalJob completion

sequenceDiagram
  actor User
  participant APIServer
  participant LMEvalJobController
  participant Pod
  participant DriverProcess
  participant OCIScript
  participant OCIRegistry

  User->>APIServer: create LMEvalJob with Outputs.OCISpec
  APIServer->>LMEvalJobController: watch LMEvalJob
  LMEvalJobController->>LMEvalJobController: ValidateUserInput (ValidateOCIAuth)
  LMEvalJobController->>LMEvalJobController: generateCmd (adds --upload-to-oci)
  LMEvalJobController->>LMEvalJobController: CreatePod (inject OCI_* env, volumes)
  LMEvalJobController->>APIServer: create Pod
  APIServer-->>Pod: schedule and start

  Pod-->>DriverProcess: start lmes_driver --upload-to-oci
  DriverProcess->>DriverProcess: run evaluation
  DriverProcess->>DriverProcess: getResults
  DriverProcess->>DriverProcess: uploadToOCI (UploadToOCI true)
  DriverProcess->>OCIScript: exec oci.py OCI_REGISTRY resultsLocation
  OCIScript->>OCIRegistry: push artifact
  OCIRegistry-->>OCIScript: success response
  OCIScript-->>DriverProcess: command output
  DriverProcess-->>APIServer: update LMEvalJobStatus.Results and completion
Loading

Sequence diagram for OCI configuration validation and pod construction

sequenceDiagram
  participant User
  participant APIServer
  participant LMEvalJobController

  User->>APIServer: submit LMEvalJob with Outputs.OCISpec
  APIServer-->>LMEvalJobController: reconcile event
  LMEvalJobController->>LMEvalJobController: ValidateUserInput
  LMEvalJobController->>LMEvalJobController: ValidateOCIAuth(OCISpec)
  alt invalid OCI auth
    LMEvalJobController-->>APIServer: update LMEvalJob Status with error
  else valid OCI auth
    LMEvalJobController->>LMEvalJobController: generateCmd (append --upload-to-oci)
    LMEvalJobController->>LMEvalJobController: CreatePod
    LMEvalJobController->>LMEvalJobController: add OCI_* env vars
    LMEvalJobController->>LMEvalJobController: mount docker config secret volume
    LMEvalJobController-->>APIServer: create Pod
  end
Loading

Entity relationship diagram for new OCISpec in LMEvalJob outputs

erDiagram
  LMEvalJobSpec ||--o{ Outputs : has
  Outputs ||--o| OCISpec : oci

  LMEvalJobSpec {
    string Name
    Outputs Outputs
    bool HasOCIOutput()
    bool HasOfflineS3()
    bool HasCustomOutput()
  }

  Outputs {
    string PersistentVolumeClaimName
    PersistentVolumeClaimManaged PersistentVolumeClaimManaged
    OCISpec OCISpec
    bool HasManagedPVC()
    bool HasExistingPVC()
    bool HasOCI()
  }

  OCISpec {
    SecretKeySelector Registry
    string Repository
    string Tag
    string Subject
    SecretKeySelector UsernameRef
    SecretKeySelector PasswordRef
    SecretKeySelector DockerConfigJsonRef
    bool VerifySSL
    bool HasUsernamePassword()
    bool HasDockerConfigJson()
  }

  SecretKeySelector {
    string Name
    string Key
    bool Optional
  }

  PersistentVolumeClaimManaged {
    string StorageClassName
    string Size
  }
Loading

Class diagram for OCISpec wiring, driver options, and validation

classDiagram
  class LMEvalJobSpec {
    +Outputs Outputs
    +OfflineStorageSpec OfflineStorageSpec
    +bool HasOfflineS3()
    +bool HasOCIOutput()
    +bool HasCustomOutput()
  }

  class Outputs {
    +string PersistentVolumeClaimName
    +PersistentVolumeClaimManaged PersistentVolumeClaimManaged
    +OCISpec OCISpec
    +bool HasManagedPVC()
    +bool HasExistingPVC()
    +bool HasOCI()
  }

  class OCISpec {
    +SecretKeySelector Registry
    +string Repository
    +string Tag
    +string Subject
    +SecretKeySelector UsernameRef
    +SecretKeySelector PasswordRef
    +SecretKeySelector DockerConfigJsonRef
    +bool VerifySSL
    +bool HasUsernamePassword()
    +bool HasDockerConfigJson()
  }

  class SecretKeySelector {
    +string Name
    +string Key
    +bool Optional
  }

  class DriverOption {
    +string OutputPath
    +bool DownloadAssetsS3
    +bool UploadToOCI
    +int CommPort
    +string CustomTaskGitURL
    +string CustomTaskGitBranch
    +string CustomTaskGitCommit
  }

  class driverImpl {
    +DriverOption Option
    +Status status
    +error downloadS3Assets()
    +error uploadToOCI()
    +void updateCompleteStatus(error err)
    +string getResults()
  }

  class ValidationHelpers {
    +error ValidateUserInput(LMEvalJob job)
    +error ValidateOCIPath(string path)
    +error ValidateOCIAuth(OCISpec ociSpec)
  }

  class CreatePodLogic {
    +Pod CreatePod(ServiceOptions svcOpts, LMEvalJob job, PermConfig permConfig)
  }

  class GenerateCmdLogic {
    +[]string generateCmd(ServiceOptions svcOpts, LMEvalJob job, PermConfig permConfig)
  }

  LMEvalJobSpec --> Outputs : has
  Outputs --> OCISpec : oci
  OCISpec --> SecretKeySelector : uses
  DriverOption <.. driverImpl : uses
  ValidationHelpers <.. LMEvalJobSpec : validated_by
  ValidationHelpers <.. OCISpec : validated_by
  CreatePodLogic --> LMEvalJobSpec : reads
  CreatePodLogic --> OCISpec : injects_env
  GenerateCmdLogic --> LMEvalJobSpec : reads
  GenerateCmdLogic --> DriverOption : sets_uploadToOCI
Loading

File-Level Changes

Change Details Files
Introduce OCISpec API and CRD schema for configuring OCI uploads from LMEvalJob outputs.
  • Add OCISpec struct to LMEvalJob Outputs with registry, repository, tag, subject, auth and SSL fields.
  • Generate deepcopy implementations for OCISpec and wire it into Outputs.DeepCopy.
  • Extend LMEvalJobSpec and Outputs helpers with HasOCIOutput and HasOCI methods.
  • Update LMEvalJob CRD schema to include the new outputs.oci section with validation patterns and required fields.
api/lmes/v1alpha1/lmevaljob_types.go
api/lmes/v1alpha1/zz_generated.deepcopy.go
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml
Propagate OCI configuration into the created pod via environment variables and optional docker config volume/volumeMount.
  • Adjust CreatePod output PVC handling to explicitly check for managed/existing PVCs instead of HasCustomOutput.
  • When Outputs.OCISpec is present, inject OCI_* environment variables for registry, repository, tag (defaulting to pod name), optional subject, and verify SSL.
  • If username/password auth is configured, add OCI_USERNAME and OCI_PASSWORD env vars sourced from secrets.
  • If dockerConfigJson auth is configured, add a docker-config volume and mount pointing to /tmp/.docker/config.json backed by the referenced secret.
controllers/lmes/lmevaljob_controller.go
Extend driver to support an --upload-to-oci flag and perform post-completion upload via an external OCI script.
  • Add UploadToOCI field to DriverOption and plumb it from a new --upload-to-oci CLI flag in lmes_driver main.
  • Implement driverImpl.uploadToOCI to read OCI_REGISTRY/OCI_PATH from env, compute resultsLocation, and execute scripts/oci.py with debug output, returning detailed errors on failure.
  • Invoke uploadToOCI from updateCompleteStatus after successfully retrieving results so failures surface in status/messages.
controllers/lmes/driver/driver.go
cmd/lmes_driver/main.go
Validate OCI configuration (path and authentication) as part of LMEvalJob user input validation.
  • Add ValidateOCIPath utility enforcing allowed characters and rejecting traversal patterns and shell metacharacters.
  • Add ValidateOCIAuth utility ensuring OCISpec is non-nil and has either username/password or dockerConfigJson, but not both, and that username/password come as a pair.
  • Integrate OCI validation into ValidateUserInput when HasOCIOutput is true, wrapping any errors as invalid OCI authentication.
controllers/lmes/validation.go
Add unit tests to cover OCI helpers, validation, pod configuration, command generation, and driver upload behavior.
  • Add tests for generateCmd ensuring --upload-to-oci is included/excluded based on HasOCIOutput.
  • Add CreatePod tests verifying injected OCI_* env vars, subject/tag handling, username/password vs docker config cases, and docker-config volume wiring.
  • Add helper tests for HasOCIOutput/HasOCI and OCISpec helper methods (HasUsernamePassword, HasDockerConfigJson).
  • Add validation tests for ValidateOCIPath, ValidateOCIAuth, and ValidateUserInput with valid/invalid OCI specs.
  • Add driver tests for OCI upload flow: success path (script missing error), disabled upload, missing registry, and direct uploadToOCI error behavior.
controllers/lmes/lmevaljob_controller_test.go
controllers/lmes/lmevaljob_controller_validation_test.go
controllers/lmes/driver/driver_test.go
Add extra logging and minor CI workflow diagnostics related to configuration.
  • Log constructed options from ConfigMap and applied DSC configuration in SetupWithManager for easier debugging.
  • In build-and-push GitHub workflow, cat params.env files for base and overlays before pushing artifacts.
controllers/lmes/lmevaljob_controller.go
.github/workflows/build-and-push.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Warning

Rate limit exceeded

@tarilabs has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 34 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ef3180c and 893c736.

📒 Files selected for processing (1)
  • controllers/lmes/driver/driver.go (3 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds OCI output support: new OCISpec API and CRD fields, deepcopy methods, controller wiring to inject OCI envs/volumes and generate an upload flag, driver plumbing to perform OCI uploads, validation functions and tests, Dockerfile additions, and CI debug output.

Changes

Cohort / File(s) Summary
API & deepcopy
api/lmes/v1alpha1/lmevaljob_types.go, api/lmes/v1alpha1/zz_generated.deepcopy.go
Add OCISpec type (registry, repository, tag, subject, auth refs, verifySSL), helper methods (HasOCIOutput, HasUsernamePassword, HasDockerConfigJson), wire OCI into Outputs, and add deepcopy implementations.
CRD
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml
Add OCI output schema to CRD with required registry/repository, secret selectors for auth, and verifySSL field.
Controller: pod creation & command
controllers/lmes/lmevaljob_controller.go
Refactor output checks; inject OCI env vars, volumes, and docker-config secret/volume mounts when OCI outputs configured; add --upload-to-oci flag to generated driver command; add setup logging.
Controller validation & helpers
controllers/lmes/validation.go
Add ValidateOCIPath() and ValidateOCIAuth() and integrate OCI validation into ValidateUserInput().
Controller tests
controllers/lmes/lmevaljob_controller_test.go, controllers/lmes/lmevaljob_controller_validation_test.go
Add extensive OCI tests: command generation, pod/env/volume/secret behaviors, subject/tag variants, and validation helper coverage.
Driver & tests
cmd/lmes_driver/main.go, controllers/lmes/driver/driver.go, controllers/lmes/driver/driver_test.go
Add --upload-to-oci CLI flag and UploadToOCI option; implement uploadToOCI() invoked on successful completion when enabled; add tests for enabled/disabled/missing-registry/error scenarios.
Build image
Dockerfile.lmes-job
Install skopeo, create project directories and cache, set HF_HOME/UNITXT_CATALOGS, fetch and install lm-evaluation-harness, write unitxt hint, and switch to non-root user.
CI debug
.github/workflows/build-and-push.yaml
Print contents of config/base/params.env, config/overlays/odh/params.env, and config/overlays/rhoai/params.env during CI after cleanup step.
sequenceDiagram
    participant User
    participant Controller
    participant Pod
    participant Driver
    participant OCI as "OCI Registry"

    User->>Controller: Submit LMEvalJob with OCISpec
    Controller->>Controller: Validate OCI path & auth
    Controller->>Pod: Create Pod (inject OCI envs, mount secrets if needed)
    Pod->>Pod: Run evaluation harness → produce results
    Pod->>Driver: Start driver (includes --upload-to-oci if configured)
    Driver->>Driver: Check UploadToOCI flag
    alt UploadToOCI true
        Driver->>OCI: Upload results using OCI envs & auth
        OCI-->>Driver: Success / Failure
        Driver->>Controller: Report upload outcome
    else UploadToOCI false
        Driver->>Controller: Report job completion
    end
    Controller->>User: Update LMEvalJob status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Files/areas needing extra attention:
    • Duplicate OCI env/volume insertion points in controllers/lmes/lmevaljob_controller.go.
    • Regex/rules and edge cases in ValidateOCIPath.
    • Mutual-exclusivity and secret handling in ValidateOCIAuth.
    • Shell/exec error handling and environment usage in uploadToOCI().
    • Tests asserting secret mounts, volumeSources, and environment propagation.

Suggested labels

lgtm

Suggested reviewers

  • sheltoncyril

Poem

🐇 I sniffed a registry, tidy and sly,

Env vars set, secrets tucked by,
Volumes mounted, commands in flight,
The driver hops — results take flight,
OCI-bound beneath moonlight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title "merge dev/oci feature to main" is vague and uses generic terminology that does not clearly describe the primary changes in the changeset. Revise the title to be more specific and descriptive, such as "Add OCI artifact output support for LMEvalJob" to clearly convey the main feature being merged.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 19, 2025

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-operator:893c73685a3420fc842beec497b303edeafc15e5

📦 LMES driver image: quay.io/trustyai/ta-lmes-driver-ci:893c73685a3420fc842beec497b303edeafc15e5

📦 LMES job image: quay.io/trustyai/ta-lmes-job-ci:893c73685a3420fc842beec497b303edeafc15e5

📦 Guardrails orchestrator image: quay.io/trustyai/ta-guardrails-orchestrator:latest

🗂️ CI manifests

      devFlags:
        manifests:
          - contextDir: config
            sourcePath: ''
            uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/operator-893c73685a3420fc842beec497b303edeafc15e5

Signed-off-by: tarilabs <[email protected]>
Signed-off-by: tarilabs <[email protected]>
Signed-off-by: tarilabs <[email protected]>
Signed-off-by: tarilabs <[email protected]>
Signed-off-by: tarilabs <[email protected]>
Signed-off-by: tarilabs <[email protected]>
instead of SecretKeyRef

Signed-off-by: tarilabs <[email protected]>
since it's not really known before-hand to
the end-user

Signed-off-by: tarilabs <[email protected]>
Signed-off-by: tarilabs <[email protected]>
@tarilabs
Copy link
Author

testing with DSC set as:

    trustyai:
      devFlags:
        manifests:
          - contextDir: config
            sourcePath: ''
            uri: 'https://api.github.com/repos/tarilabs/trustyai-service-operator/tarball/tarilabs-20251127'
      eval:
        lmeval:
          permitCodeExecution: allow
          permitOnline: allow
      managementState: Managed

pointing to this commit: tarilabs@63212f9
for the relevant images

using this CR:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: test-oci-job
  namespace: ds-prj1
spec:
  allowOnline: true
  allowCodeExecution: true
  model: hf
  modelArgs:
  - name: pretrained
    value: EleutherAI/pythia-70m
  taskList:
    taskNames:
    - unfair_tos # arc_easy, unfair_tos
  logSamples: true
  limit: "5"
  outputs:
    pvcManaged: 
      size: 5Gi
    oci:
      registry:
        name: my-oci-credentials
        key: OCI_HOST
      repository: mmortari/demo20251016-lmeh-olot
      tag: test20251127v3
      # subject: "quay.io/mmortari/demo20251016-lmeh-olot:test20251127v1"
      dockerConfigJson:
        name: my-oci-credentials
        key: .dockerconfigjson

assuming in ODH this OCI Connection resource:

Screenshot 2025-11-27 at 11 45 31

.
result in completed LMEval and pushed artifact ...v3:

image

.
and in Quay:

Screenshot 2025-11-27 at 16 56 38 (2)

.
please note/be informed this artifact is a CRI-O ImageVolume mount compatible image

then using ...v3 as a stand-in for a Model/Artifact to be measured,
we can use a CR instance referring to v3 as a subject with:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: test-oci-job
  namespace: ds-prj1
spec:
  allowOnline: true
  allowCodeExecution: true
  model: hf
  modelArgs:
  - name: pretrained
    value: EleutherAI/pythia-70m
  taskList:
    taskNames:
    - unfair_tos # arc_easy, unfair_tos
  logSamples: true
  limit: "5"
  outputs:
    pvcManaged: 
      size: 5Gi
    oci:
      registry:
        name: my-oci-credentials
        key: OCI_HOST
      repository: mmortari/demo20251016-lmeh-olot
      tag: test20251127v4
      subject: "quay.io/mmortari/demo20251016-lmeh-olot:test20251127v3"
      dockerConfigJson:
        name: my-oci-credentials
        key: .dockerconfigjson

results in the Job completing:

Screenshot 2025-11-27 at 17 00 06

.
and the container in Quay referring the the previous ...v3 artifact
so we can conclude now this ...v4 is referring to a measurement of the ...v3 which was standing-in as a Model/Asset subject of this (...v4) measurement:

Screenshot 2025-11-27 at 17 04 06 (2)

.
because ...v4 refers to the ...v3 as a subject.

Screenshot 2025-11-27 at 17 06 32 (2)

@tarilabs tarilabs marked this pull request as ready for review November 27, 2025 16:12
@tarilabs tarilabs changed the title [DO NOT MERGE] dev/oci rebase, need the images merge dev/oci feature to main Nov 27, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • The driver’s uploadToOCI relies on OCI_PATH to locate results, and several tests assert OCI_PATH is set, but CreatePod only wires OCI_REGISTRY, OCI_REPOSITORY, OCI_TAG, OCI_SUBJECT, and OCI_VERIFY_SSL; consider either adding an explicit Path field to OCISpec and setting OCI_PATH in the pod env, or removing OCI_PATH usage/tests to avoid a broken/mismatched configuration.
  • ValidateOCIPath is implemented and heavily tested but is never invoked by ValidateUserInput or anywhere else; either wire it into the OCI validation flow (e.g., for a new path field or to validate repository if that was the intent) or drop the dead code and tests to keep the validation logic coherent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The driver’s `uploadToOCI` relies on `OCI_PATH` to locate results, and several tests assert `OCI_PATH` is set, but `CreatePod` only wires `OCI_REGISTRY`, `OCI_REPOSITORY`, `OCI_TAG`, `OCI_SUBJECT`, and `OCI_VERIFY_SSL`; consider either adding an explicit `Path` field to `OCISpec` and setting `OCI_PATH` in the pod env, or removing `OCI_PATH` usage/tests to avoid a broken/mismatched configuration.
- `ValidateOCIPath` is implemented and heavily tested but is never invoked by `ValidateUserInput` or anywhere else; either wire it into the OCI validation flow (e.g., for a new `path` field or to validate `repository` if that was the intent) or drop the dead code and tests to keep the validation logic coherent.

## Individual Comments

### Comment 1
<location> `config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml:343-345` </location>
<code_context>
+                        - key
+                        type: object
+                        x-kubernetes-map-type: atomic
+                      repository:
+                        description: Repository path (e.g., myorg/evaluation-results)
+                        pattern: ^[a-zA-Z0-9._/-]*$
+                        type: string
+                      subject:
</code_context>

<issue_to_address>
**issue (bug_risk):** Repository field is required but regex allows empty string, which may lead to invalid-but-accepted configuration

The CRD marks `repository` as required, but the pattern `^[a-zA-Z0-9._/-]*$` still permits an empty string, so invalid configs can pass validation and only fail at runtime. Please change the pattern to require at least one character (e.g. `^[a-zA-Z0-9._/-]+$`) so empty values are rejected by the schema.
</issue_to_address>

### Comment 2
<location> `controllers/lmes/driver/driver.go:259-266` </location>
<code_context>
+		return fmt.Errorf("OCI_REGISTRY environment variable not set")
+	}
+
+	pathFromEnv := os.Getenv("OCI_PATH")
+	var resultsLocation string
+	if pathFromEnv == "" {
+		// If OCI_PATH is not set, use the output path directly
+		resultsLocation = d.Option.OutputPath
+	} else {
+		// If OCI_PATH is set, join it with the output path
+		resultsLocation = filepath.Join(d.Option.OutputPath, pathFromEnv)
+	}
+
</code_context>

<issue_to_address>
**🚨 issue (security):** Unvalidated OCI_PATH environment variable can escape the intended output directory

`OCI_PATH` is joined directly with `d.Option.OutputPath` via `filepath.Join`. If it can be influenced by users and include `../` or similar segments, this becomes a path traversal risk, allowing access outside the intended directory. Since you already have `ValidateOCIPath`, please either validate `pathFromEnv` before joining or strictly control and document that `OCI_PATH` cannot contain unsafe path components.
</issue_to_address>

### Comment 3
<location> `controllers/lmes/driver/driver.go:269-295` </location>
<code_context>
+		resultsLocation = filepath.Join(d.Option.OutputPath, pathFromEnv)
+	}
+
+	cmd := []string{"python", "/opt/app-root/src/scripts/oci.py", registryFromEnv, resultsLocation}
+	fmt.Printf("[DEBUG] OCI upload CLI: %v\n", cmd)
+
+	// List all files and directories in resultsLocation
+	fmt.Printf("[DEBUG] Contents of results location (%s):\n", resultsLocation)
+	_ = filepath.Walk(resultsLocation, func(path string, info os.FileInfo, err error) error {
+		if err != nil {
+			fmt.Printf("  [error] %v\n", err)
+			return nil
+		}
+		rel, _ := filepath.Rel(resultsLocation, path)
+		if rel == "." {
+			fmt.Printf("  %s/\n", rel)
+		} else if info.IsDir() {
+			fmt.Printf("  %s/\n", rel)
+		} else {
+			fmt.Printf("  %s\n", rel)
+		}
+		return nil
+	})
+
+	output, err := exec.Command(
+		"python",
+		"/opt/app-root/src/scripts/oci.py",
</code_context>

<issue_to_address>
**suggestion:** Redundant construction of `cmd` slice that is not used for the exec call

`cmd` is only used for logging while the actual call duplicates the arguments in `exec.Command`. This risks the logged command drifting from what is executed. Either reuse `cmd` with `exec.Command(cmd[0], cmd[1:]...)` or remove `cmd` and format the arguments directly in the debug log.

```suggestion
	cmd := []string{"python", "/opt/app-root/src/scripts/oci.py", registryFromEnv, resultsLocation}
	fmt.Printf("[DEBUG] OCI upload CLI: %v\n", cmd)

	// List all files and directories in resultsLocation
	fmt.Printf("[DEBUG] Contents of results location (%s):\n", resultsLocation)
	_ = filepath.Walk(resultsLocation, func(path string, info os.FileInfo, err error) error {
		if err != nil {
			fmt.Printf("  [error] %v\n", err)
			return nil
		}
		rel, _ := filepath.Rel(resultsLocation, path)
		if rel == "." {
			fmt.Printf("  %s/\n", rel)
		} else if info.IsDir() {
			fmt.Printf("  %s/\n", rel)
		} else {
			fmt.Printf("  %s\n", rel)
		}
		return nil
	})

	output, err := exec.Command(cmd[0], cmd[1:]...).Output()
```
</issue_to_address>

### Comment 4
<location> `.github/workflows/build-and-push.yaml:174-176` </location>
<code_context>
           rm -Rf $(ls . | grep -v config)
           rm -Rf .gitignore .dockerignore .github .git .yamllint.yaml
+
+          cat ./config/base/params.env
+          cat ./config/overlays/odh/params.env
+          cat ./config/overlays/rhoai/params.env
       # pysh to ci-manifest repo
       - uses: cpina/github-action-push-to-another-repository@main
</code_context>

<issue_to_address>
**🚨 issue (security):** Printing params.env files in CI logs may expose sensitive configuration

These `cat` commands will dump the full contents of the `params.env` files into the GitHub Actions logs. If any of these files hold credentials, tokens, or other sensitive config, that data becomes exposed in logs (and any downstream log aggregation). Please avoid printing them directly—either remove these commands, or restrict output to known non-sensitive values and ensure any secrets are masked.
</issue_to_address>

### Comment 5
<location> `controllers/lmes/lmevaljob_controller_test.go:3866-3875` </location>
<code_context>
+func Test_OCICommandGeneration(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider extending OCI command tests to cover additional edge cases for flag generation

Currently `Test_OCICommandGeneration` only checks the presence vs absence of `Outputs.OCISpec`. Given the `generateCmd` logic, please add tests for:

- A job with non-nil `Outputs` but nil `OCISpec`, verifying `--upload-to-oci` is not added.
- A job with both a custom output PVC and an OCI spec (and/or offline mode), verifying `--upload-to-oci` is still set correctly alongside flags like `--download-assets-s3`.

This will confirm that the flag is driven solely by `HasOCIOutput()` and is unaffected by other output settings.

Suggested implementation:

```golang
func Test_OCICommandGeneration(t *testing.T) {
	svcOpts := &serviceOptions{
		PodImage:         "podimage:latest",
		DriverImage:      "driver:latest",
		ImagePullPolicy:  corev1.PullAlways,
		MaxBatchSize:     20,
		DefaultBatchSize: "4",
	}

	// Job with non-nil Outputs but nil OCISpec should NOT trigger --upload-to-oci
	t.Run("NonNilOutputsNilOCISpec", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Outputs: &lmesv1alpha1.LMEvalJobOutputs{
					// OCISpec intentionally nil
				},
			},
		}

		args := generateArgs(svcOpts, job, log)
		assert.NotContains(t, args, "--upload-to-oci")
	})

	// Job with a custom output PVC and an OCI spec should still trigger --upload-to-oci
	t.Run("CustomPVCAndOCIOutput", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Outputs: &lmesv1alpha1.LMEvalJobOutputs{
					OutputPVC: &lmesv1alpha1.OutputPVCSpec{
						Name: "custom-output-pvc",
					},
					OCISpec: &lmesv1alpha1.OCIOutputSpec{
						Repository: "example.com/my/repo",
					},
					// If there is an "Offline" or similar flag that affects asset download,
					// it should be set here as appropriate in the real type.
				},
			},
		}

		args := generateArgs(svcOpts, job, log)
		assert.Contains(t, args, "--upload-to-oci")
		// Optionally also assert the coexistence with other flags like asset-download flags,
		// e.g. assert.Contains(t, args, "--download-assets-s3") once you confirm the exact flag.
	})

	t.Run("WithOCIOutput", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{

```

The above edits assume the following type and field names, which you may need to adjust to match your actual API:

- `lmesv1alpha1.LMEvalJobSpec` is the type of `Spec` on `LMEvalJob`.
- `Spec.Outputs` is of type `*lmesv1alpha1.LMEvalJobOutputs` (or similarly named).
- `LMEvalJobOutputs` has fields:
  - `OCISpec *lmesv1alpha1.OCIOutputSpec`
  - `OutputPVC *lmesv1alpha1.OutputPVCSpec`
- `OutputPVCSpec` has a `Name string` field (or equivalent identifying PVC name).
- `log` is already defined in the test file scope and used by the existing `generateArgs` calls.
- The flag you want to assert for asset download (e.g. `--download-assets-s3`) should be updated to match the actual flag emitted by `generateCmd`/`generateArgs`.

To fully implement your suggestion:
1. Align the struct types and field names in the two new subtests with your actual CRD types.
2. If `HasOCIOutput()` is influenced by additional fields (e.g. a boolean enabling OCI output), set those explicitly in the new test jobs to ensure the tests mirror real-world configurations.
3. If your command actually emits a specific asset-download flag (such as `--download-assets-s3` or `--offline`), add concrete `assert.Contains` checks for that flag in `CustomPVCAndOCIOutput` so the test verifies coexistence of `--upload-to-oci` with other output-related flags.
</issue_to_address>

### Comment 6
<location> `controllers/lmes/lmevaljob_controller_test.go:3921-3930` </location>
<code_context>
+func Test_OCIPodConfiguration(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** OCI pod configuration tests could add coverage for VerifySSL and mutual exclusivity of auth mechanisms

Two focused additions would help:

1. Add a subtest where `VerifySSL` is explicitly `boolPtr(false)` and assert `OCI_VERIFY_SSL == "false"` to confirm the field override is honored.
2. Add a subtest for the username/password path that asserts the `docker-config` volume and corresponding volumeMount are *absent* when `HasUsernamePassword()` is true.

Both changes stay within this test suite and more completely exercise the OCI pod setup rules.

Suggested implementation:

```golang
func Test_OCIPodConfiguration(t *testing.T) {

```

You’ll need to edit the body of `Test_OCIPodConfiguration` (which isn’t fully visible in the snippet). The following describes concrete code to add, assuming the surrounding helpers follow typical controller test patterns in this file:

1. **Add a subtest for `VerifySSL: boolPtr(false)`**

Inside `Test_OCIPodConfiguration`, at the same level as `t.Run("BasicOCIConfiguration", ...)`, add something along these lines:

```go
	t.Run("VerifySSLFalseOverridesDefault", func(t *testing.T) {
		logger := log.FromContext(context.Background())

		// Arrange: Create an LMEvalJob with OCI config and VerifySSL explicitly false
		job := &lmesv1alpha1.LMEvalJob{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "oci-verifyssl-false",
				Namespace: "default",
			},
			Spec: lmesv1alpha1.LMEvalJobSpec{
				// ... whatever fields BasicOCIConfiguration uses ...
				OCI: &lmesv1alpha1.OCIConfig{
					Enabled:   pointer.Bool(true),
					VerifySSL: pointer.Bool(false),
					// keep the rest of the fields the same as BasicOCIConfiguration so we only vary VerifySSL
				},
			},
		}

		// Act: generate the pod (or job) that configures the OCI upload
		pod, err := buildEvalPod(logger, svcOpts, job) // adjust to the actual helper / function used in this file
		require.NoError(t, err)

		// Assert: there is an env var OCI_VERIFY_SSL == "false"
		var verifySSLEnv *corev1.EnvVar
		for i := range pod.Spec.Containers {
			for j := range pod.Spec.Containers[i].Env {
				if pod.Spec.Containers[i].Env[j].Name == "OCI_VERIFY_SSL" {
					verifySSLEnv = &pod.Spec.Containers[i].Env[j]
					break
				}
			}
		}
		require.NotNil(t, verifySSLEnv, "expected OCI_VERIFY_SSL env var to be set when VerifySSL is specified")
		assert.Equal(t, "false", verifySSLEnv.Value, "VerifySSL=false should set OCI_VERIFY_SSL to \"false\"")
	})
```

Adjust the constructor call `buildEvalPod(...)` and `OCIConfig` path (`lmesv1alpha1` package, field names) to match the existing code in this test file. If `boolPtr` is already defined as a helper, use `VerifySSL: boolPtr(false)` instead of `pointer.Bool(false)` to be consistent.

2. **Add a subtest to ensure docker-config volume is absent when username/password auth is used**

Still inside `Test_OCIPodConfiguration`, add another sibling subtest:

```go
	t.Run("UsernamePasswordAuthExcludesDockerConfigVolume", func(t *testing.T) {
		logger := log.FromContext(context.Background())

		job := &lmesv1alpha1.LMEvalJob{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "oci-username-password",
				Namespace: "default",
			},
			Spec: lmesv1alpha1.LMEvalJobSpec{
				// ... same structure as other OCI tests ...
				OCI: &lmesv1alpha1.OCIConfig{
					Enabled: pointer.Bool(true),
					// Configure username/password auth; no docker config secret
					Registry:  "registry.example.com",
					Username:  pointer.String("test-user"),
					Password:  &corev1.SecretKeySelector{
						LocalObjectReference: corev1.LocalObjectReference{Name: "oci-password-secret"},
						Key:                  "password",
					},
					// Important: do NOT set DockerConfigSecret or equivalent field here,
					// so HasUsernamePassword() resolves to true and "docker-config" is not used.
				},
			},
		}

		pod, err := buildEvalPod(logger, svcOpts, job) // adjust helper name
		require.NoError(t, err)

		// Assert: no docker-config volume
		for _, v := range pod.Spec.Volumes {
			assert.NotEqual(t, "docker-config", v.Name, "docker-config volume must not be present for username/password auth")
		}

		// Assert: no docker-config volume mount on any container
		for _, c := range pod.Spec.Containers {
			for _, m := range c.VolumeMounts {
				assert.NotEqual(t, "docker-config", m.Name, "docker-config volumeMount must not be present for username/password auth")
			}
		}
	})
```

Again, adjust:
- The `OCIConfig` struct and fields to match your CRD’s Go types.
- The pod-building function (`buildEvalPod`, `makeEvalJobPod`, `generatePod`, etc.) to match what `BasicOCIConfiguration` uses.
- The literal `"docker-config"` to whatever name your existing OCI docker-config volume and mount actually use in the BasicOCIConfiguration test.

3. **Keep tests within this suite and consistent with existing helpers**

Make sure the new subtests:
- Use the same `svcOpts` created at the top of `Test_OCIPodConfiguration`.
- Mirror how `BasicOCIConfiguration` constructs the `LMEvalJob` and generates the Pod/Job so that only the auth-related fields (`VerifySSL`, username/password vs docker config) differ.
- Reuse any helper functions (`boolPtr`, `strPtr`, `newTestJob`, etc.) that are already defined in this test file or its test helpers.

Once you wire those subtests into `Test_OCIPodConfiguration`, you will have direct coverage for both the `VerifySSL` override behavior and the mutual exclusivity between username/password and docker-config based auth.
</issue_to_address>

### Comment 7
<location> `controllers/lmes/lmevaljob_controller_validation_test.go:1566` </location>
<code_context>
+	})
+}
+
+func Test_ValidateOCIPath(t *testing.T) {
+	testCases := []struct {
+		name        string
</code_context>

<issue_to_address>
**suggestion (testing):** OCI path validation tests are tightly coupled to error message substrings

In `Test_ValidateOCIPath`, the negative cases assert on specific substrings from `err.Error()`, which makes the tests brittle and tightly coupled to current error wording.

Consider instead:
- Asserting only that an error occurs, and at most that it indicates a broad category (e.g., "invalid path"), or
- Defining sentinel errors / error types and asserting with `errors.Is` or type assertions.

This keeps tests focused on behavior while allowing error messages to evolve safely.

Suggested implementation:

```golang
func Test_ValidateOCIPath(t *testing.T) {
	testCases := []struct {
		name        string
		path        string
		expectError bool
	}{

```

To fully decouple the tests from specific error message substrings, you should also:

1. **Update the table entries** in `Test_ValidateOCIPath`:
   - Remove the `errorMsg` value from each test case literal.
   - Keep only `name`, `path`, and `expectError` for each case.

2. **Update the assertion logic** in the test loop (where `testCases` is iterated):
   - Remove any references to `tc.errorMsg`.
   - Instead of `assert.Contains(t, err.Error(), tc.errorMsg)` (or similar), use broad checks like:
     ```go
     if tc.expectError {
         require.Error(t, err, "expected an error for test case %s", tc.name)
         // Optionally, if you have categories:
         // require.True(t, errors.Is(err, ErrInvalidOCIPath))
     } else {
         require.NoError(t, err, "did not expect an error for test case %s", tc.name)
     }
     ```
   - If there are multiple negative categories you care about and you already have sentinel errors or error types in production code, replace string checks with `errors.Is` or type assertions (e.g., `var e *InvalidPathError; assert.ErrorAs(t, err, &e)`).

These adjustments will ensure `Test_ValidateOCIPath` validates behavior (error vs. no error, and optionally broad error categories) without coupling to the exact error text.
</issue_to_address>

### Comment 8
<location> `controllers/lmes/lmevaljob_controller_validation_test.go:1733-1742` </location>
<code_context>
+func Test_ValidateUserInputWithOCI(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** Add a ValidateUserInput test for the conflicting OCI auth configuration case

Since `ValidateUserInput` now calls `ValidateOCIAuth`, please add a subtest to `Test_ValidateUserInputWithOCI` where `OCISpec` sets both `UsernameRef`/`PasswordRef` and `DockerConfigJsonRef`. This will verify that `ValidateUserInput` correctly propagates the "both username/password and dockerConfigJson" error already covered at the `ValidateOCIAuth` level.

Suggested implementation:

```golang
func Test_ValidateUserInputWithOCI(t *testing.T) {
	t.Run("ValidOCIConfiguration", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Model: "hf",
				TaskList: lmesv1alpha1.TaskList{
					TaskNames: []string{"hellaswag"},
				},
				Outputs: &lmesv1alpha1.Outputs{
					OCISpec: &lmesv1alpha1.OCISpec{
						UsernameRef: &corev1.SecretKeySelector{

```

I can only see the beginning of `Test_ValidateUserInputWithOCI`, so I can’t safely edit the full function body. Below is the subtest you should add **inside** `Test_ValidateUserInputWithOCI`, as a sibling to the existing `"ValidOCIConfiguration"` subtest:

```go
	t.Run("ConflictingOCIAuthConfiguration", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Model: "hf",
				TaskList: lmesv1alpha1.TaskList{
					TaskNames: []string{"hellaswag"},
				},
				Outputs: &lmesv1alpha1.Outputs{
					OCISpec: &lmesv1alpha1.OCISpec{
						// Intentionally set both username/password *and* dockerConfigJson
						UsernameRef: &corev1.SecretKeySelector{
							LocalObjectReference: corev1.LocalObjectReference{
								Name: "oci-creds",
							},
							Key: "username",
						},
						PasswordRef: &corev1.SecretKeySelector{
							LocalObjectReference: corev1.LocalObjectReference{
								Name: "oci-creds",
							},
							Key: "password",
						},
						DockerConfigJsonRef: &corev1.SecretKeySelector{
							LocalObjectReference: corev1.LocalObjectReference{
								Name: "oci-dockerconfig",
							},
							Key: ".dockerconfigjson",
						},
					},
				},
			},
		}

		err := ValidateUserInput(job)
		assert.Error(t, err, "Should reject conflicting OCI auth configuration")
		// Use the same or a substring of the error message used in ValidateOCIAuth
		assert.Contains(t, err.Error(), "both username/password and dockerConfigJson")
	})
```

Place this `t.Run("ConflictingOCIAuthConfiguration", ...)` block after the existing `"ValidOCIConfiguration"` subtest (but before the closing `}` of `Test_ValidateUserInputWithOCI`).

If the exact error message from `ValidateOCIAuth` differs (for example, `"cannot set both username/password and dockerConfigJson"`), update the string in `assert.Contains` to match the actual error message so the test verifies that `ValidateUserInput` properly propagates that validation error.
</issue_to_address>

### Comment 9
<location> `controllers/lmes/driver/driver_test.go:447-444` </location>
<code_context>
 	assert.Nil(t, driver.Shutdown())
 }
+
+func Test_OCIUploadSuccess(t *testing.T) {
+	info := setupTest(t, true)
+	defer info.tearDown(t)
+
+	// Set up environment variables for OCI
+	os.Setenv("OCI_REGISTRY", "registry.example.com")
+	defer func() {
+		os.Unsetenv("OCI_REGISTRY")
+		os.Unsetenv("OCI_PATH")
+	}()
+
+	driver, err := NewDriver(&DriverOption{
+		Context:     context.Background(),
+		OutputPath:  info.outputPath,
+		CatalogPath: info.catalogPath,
+		Logger:      driverLog,
+		Args:        []string{"sh", "-ec", "echo 'test completed'"},
+		CommPort:    info.port,
+		UploadToOCI: true,
+	})
+	assert.Nil(t, err)
+
+	// This will fail because the OCI script doesn't exist, but we can test the setup
+	msgs, _ := runDriverAndWait4Complete(t, driver, true)
+
+	// Should fail during OCI upload since script doesn't exist
+	assert.Contains(t, msgs[len(msgs)-1], "failed to upload results to OCI")
+
+	assert.Nil(t, driver.Shutdown())
+}
+
</code_context>

<issue_to_address>
**suggestion (testing):** Use t.Setenv and consider adding a positive-path test that simulates a successful OCI upload

In `Test_OCIUploadSuccess` (and the other OCI tests), use `t.Setenv` instead of `os.Setenv`/`os.Unsetenv` so env changes are scoped to each test and can’t leak between cases.

Also, this test only covers the failure path (missing OCI script). Consider adding a separate test for the success path of `uploadToOCI` by wiring in a temporary executable that returns 0 and asserting the call completes without error. That would exercise the happy path and verify the command wiring, not just the failure behavior.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +343 to +345
repository:
description: Repository path (e.g., myorg/evaluation-results)
pattern: ^[a-zA-Z0-9._/-]*$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Repository field is required but regex allows empty string, which may lead to invalid-but-accepted configuration

The CRD marks repository as required, but the pattern ^[a-zA-Z0-9._/-]*$ still permits an empty string, so invalid configs can pass validation and only fail at runtime. Please change the pattern to require at least one character (e.g. ^[a-zA-Z0-9._/-]+$) so empty values are rejected by the schema.

Comment on lines +3866 to +3875
func Test_OCICommandGeneration(t *testing.T) {
svcOpts := &serviceOptions{
PodImage: "podimage:latest",
DriverImage: "driver:latest",
ImagePullPolicy: corev1.PullAlways,
MaxBatchSize: 20,
DefaultBatchSize: "4",
}

t.Run("WithOCIOutput", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider extending OCI command tests to cover additional edge cases for flag generation

Currently Test_OCICommandGeneration only checks the presence vs absence of Outputs.OCISpec. Given the generateCmd logic, please add tests for:

  • A job with non-nil Outputs but nil OCISpec, verifying --upload-to-oci is not added.
  • A job with both a custom output PVC and an OCI spec (and/or offline mode), verifying --upload-to-oci is still set correctly alongside flags like --download-assets-s3.

This will confirm that the flag is driven solely by HasOCIOutput() and is unaffected by other output settings.

Suggested implementation:

func Test_OCICommandGeneration(t *testing.T) {
	svcOpts := &serviceOptions{
		PodImage:         "podimage:latest",
		DriverImage:      "driver:latest",
		ImagePullPolicy:  corev1.PullAlways,
		MaxBatchSize:     20,
		DefaultBatchSize: "4",
	}

	// Job with non-nil Outputs but nil OCISpec should NOT trigger --upload-to-oci
	t.Run("NonNilOutputsNilOCISpec", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Outputs: &lmesv1alpha1.LMEvalJobOutputs{
					// OCISpec intentionally nil
				},
			},
		}

		args := generateArgs(svcOpts, job, log)
		assert.NotContains(t, args, "--upload-to-oci")
	})

	// Job with a custom output PVC and an OCI spec should still trigger --upload-to-oci
	t.Run("CustomPVCAndOCIOutput", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Outputs: &lmesv1alpha1.LMEvalJobOutputs{
					OutputPVC: &lmesv1alpha1.OutputPVCSpec{
						Name: "custom-output-pvc",
					},
					OCISpec: &lmesv1alpha1.OCIOutputSpec{
						Repository: "example.com/my/repo",
					},
					// If there is an "Offline" or similar flag that affects asset download,
					// it should be set here as appropriate in the real type.
				},
			},
		}

		args := generateArgs(svcOpts, job, log)
		assert.Contains(t, args, "--upload-to-oci")
		// Optionally also assert the coexistence with other flags like asset-download flags,
		// e.g. assert.Contains(t, args, "--download-assets-s3") once you confirm the exact flag.
	})

	t.Run("WithOCIOutput", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{

The above edits assume the following type and field names, which you may need to adjust to match your actual API:

  • lmesv1alpha1.LMEvalJobSpec is the type of Spec on LMEvalJob.
  • Spec.Outputs is of type *lmesv1alpha1.LMEvalJobOutputs (or similarly named).
  • LMEvalJobOutputs has fields:
    • OCISpec *lmesv1alpha1.OCIOutputSpec
    • OutputPVC *lmesv1alpha1.OutputPVCSpec
  • OutputPVCSpec has a Name string field (or equivalent identifying PVC name).
  • log is already defined in the test file scope and used by the existing generateArgs calls.
  • The flag you want to assert for asset download (e.g. --download-assets-s3) should be updated to match the actual flag emitted by generateCmd/generateArgs.

To fully implement your suggestion:

  1. Align the struct types and field names in the two new subtests with your actual CRD types.
  2. If HasOCIOutput() is influenced by additional fields (e.g. a boolean enabling OCI output), set those explicitly in the new test jobs to ensure the tests mirror real-world configurations.
  3. If your command actually emits a specific asset-download flag (such as --download-assets-s3 or --offline), add concrete assert.Contains checks for that flag in CustomPVCAndOCIOutput so the test verifies coexistence of --upload-to-oci with other output-related flags.

Comment on lines +3921 to +3930
func Test_OCIPodConfiguration(t *testing.T) {
logger := log.FromContext(context.Background())
svcOpts := &serviceOptions{
PodImage: "podimage:latest",
DriverImage: "driver:latest",
ImagePullPolicy: corev1.PullAlways,
MaxBatchSize: 20,
DefaultBatchSize: "4",
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): OCI pod configuration tests could add coverage for VerifySSL and mutual exclusivity of auth mechanisms

Two focused additions would help:

  1. Add a subtest where VerifySSL is explicitly boolPtr(false) and assert OCI_VERIFY_SSL == "false" to confirm the field override is honored.
  2. Add a subtest for the username/password path that asserts the docker-config volume and corresponding volumeMount are absent when HasUsernamePassword() is true.

Both changes stay within this test suite and more completely exercise the OCI pod setup rules.

Suggested implementation:

func Test_OCIPodConfiguration(t *testing.T) {

You’ll need to edit the body of Test_OCIPodConfiguration (which isn’t fully visible in the snippet). The following describes concrete code to add, assuming the surrounding helpers follow typical controller test patterns in this file:

  1. Add a subtest for VerifySSL: boolPtr(false)

Inside Test_OCIPodConfiguration, at the same level as t.Run("BasicOCIConfiguration", ...), add something along these lines:

	t.Run("VerifySSLFalseOverridesDefault", func(t *testing.T) {
		logger := log.FromContext(context.Background())

		// Arrange: Create an LMEvalJob with OCI config and VerifySSL explicitly false
		job := &lmesv1alpha1.LMEvalJob{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "oci-verifyssl-false",
				Namespace: "default",
			},
			Spec: lmesv1alpha1.LMEvalJobSpec{
				// ... whatever fields BasicOCIConfiguration uses ...
				OCI: &lmesv1alpha1.OCIConfig{
					Enabled:   pointer.Bool(true),
					VerifySSL: pointer.Bool(false),
					// keep the rest of the fields the same as BasicOCIConfiguration so we only vary VerifySSL
				},
			},
		}

		// Act: generate the pod (or job) that configures the OCI upload
		pod, err := buildEvalPod(logger, svcOpts, job) // adjust to the actual helper / function used in this file
		require.NoError(t, err)

		// Assert: there is an env var OCI_VERIFY_SSL == "false"
		var verifySSLEnv *corev1.EnvVar
		for i := range pod.Spec.Containers {
			for j := range pod.Spec.Containers[i].Env {
				if pod.Spec.Containers[i].Env[j].Name == "OCI_VERIFY_SSL" {
					verifySSLEnv = &pod.Spec.Containers[i].Env[j]
					break
				}
			}
		}
		require.NotNil(t, verifySSLEnv, "expected OCI_VERIFY_SSL env var to be set when VerifySSL is specified")
		assert.Equal(t, "false", verifySSLEnv.Value, "VerifySSL=false should set OCI_VERIFY_SSL to \"false\"")
	})

Adjust the constructor call buildEvalPod(...) and OCIConfig path (lmesv1alpha1 package, field names) to match the existing code in this test file. If boolPtr is already defined as a helper, use VerifySSL: boolPtr(false) instead of pointer.Bool(false) to be consistent.

  1. Add a subtest to ensure docker-config volume is absent when username/password auth is used

Still inside Test_OCIPodConfiguration, add another sibling subtest:

	t.Run("UsernamePasswordAuthExcludesDockerConfigVolume", func(t *testing.T) {
		logger := log.FromContext(context.Background())

		job := &lmesv1alpha1.LMEvalJob{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "oci-username-password",
				Namespace: "default",
			},
			Spec: lmesv1alpha1.LMEvalJobSpec{
				// ... same structure as other OCI tests ...
				OCI: &lmesv1alpha1.OCIConfig{
					Enabled: pointer.Bool(true),
					// Configure username/password auth; no docker config secret
					Registry:  "registry.example.com",
					Username:  pointer.String("test-user"),
					Password:  &corev1.SecretKeySelector{
						LocalObjectReference: corev1.LocalObjectReference{Name: "oci-password-secret"},
						Key:                  "password",
					},
					// Important: do NOT set DockerConfigSecret or equivalent field here,
					// so HasUsernamePassword() resolves to true and "docker-config" is not used.
				},
			},
		}

		pod, err := buildEvalPod(logger, svcOpts, job) // adjust helper name
		require.NoError(t, err)

		// Assert: no docker-config volume
		for _, v := range pod.Spec.Volumes {
			assert.NotEqual(t, "docker-config", v.Name, "docker-config volume must not be present for username/password auth")
		}

		// Assert: no docker-config volume mount on any container
		for _, c := range pod.Spec.Containers {
			for _, m := range c.VolumeMounts {
				assert.NotEqual(t, "docker-config", m.Name, "docker-config volumeMount must not be present for username/password auth")
			}
		}
	})

Again, adjust:

  • The OCIConfig struct and fields to match your CRD’s Go types.
  • The pod-building function (buildEvalPod, makeEvalJobPod, generatePod, etc.) to match what BasicOCIConfiguration uses.
  • The literal "docker-config" to whatever name your existing OCI docker-config volume and mount actually use in the BasicOCIConfiguration test.
  1. Keep tests within this suite and consistent with existing helpers

Make sure the new subtests:

  • Use the same svcOpts created at the top of Test_OCIPodConfiguration.
  • Mirror how BasicOCIConfiguration constructs the LMEvalJob and generates the Pod/Job so that only the auth-related fields (VerifySSL, username/password vs docker config) differ.
  • Reuse any helper functions (boolPtr, strPtr, newTestJob, etc.) that are already defined in this test file or its test helpers.

Once you wire those subtests into Test_OCIPodConfiguration, you will have direct coverage for both the VerifySSL override behavior and the mutual exclusivity between username/password and docker-config based auth.

Comment on lines +1733 to +1742
func Test_ValidateUserInputWithOCI(t *testing.T) {
t.Run("ValidOCIConfiguration", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "hf",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"hellaswag"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a ValidateUserInput test for the conflicting OCI auth configuration case

Since ValidateUserInput now calls ValidateOCIAuth, please add a subtest to Test_ValidateUserInputWithOCI where OCISpec sets both UsernameRef/PasswordRef and DockerConfigJsonRef. This will verify that ValidateUserInput correctly propagates the "both username/password and dockerConfigJson" error already covered at the ValidateOCIAuth level.

Suggested implementation:

func Test_ValidateUserInputWithOCI(t *testing.T) {
	t.Run("ValidOCIConfiguration", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Model: "hf",
				TaskList: lmesv1alpha1.TaskList{
					TaskNames: []string{"hellaswag"},
				},
				Outputs: &lmesv1alpha1.Outputs{
					OCISpec: &lmesv1alpha1.OCISpec{
						UsernameRef: &corev1.SecretKeySelector{

I can only see the beginning of Test_ValidateUserInputWithOCI, so I can’t safely edit the full function body. Below is the subtest you should add inside Test_ValidateUserInputWithOCI, as a sibling to the existing "ValidOCIConfiguration" subtest:

	t.Run("ConflictingOCIAuthConfiguration", func(t *testing.T) {
		job := &lmesv1alpha1.LMEvalJob{
			Spec: lmesv1alpha1.LMEvalJobSpec{
				Model: "hf",
				TaskList: lmesv1alpha1.TaskList{
					TaskNames: []string{"hellaswag"},
				},
				Outputs: &lmesv1alpha1.Outputs{
					OCISpec: &lmesv1alpha1.OCISpec{
						// Intentionally set both username/password *and* dockerConfigJson
						UsernameRef: &corev1.SecretKeySelector{
							LocalObjectReference: corev1.LocalObjectReference{
								Name: "oci-creds",
							},
							Key: "username",
						},
						PasswordRef: &corev1.SecretKeySelector{
							LocalObjectReference: corev1.LocalObjectReference{
								Name: "oci-creds",
							},
							Key: "password",
						},
						DockerConfigJsonRef: &corev1.SecretKeySelector{
							LocalObjectReference: corev1.LocalObjectReference{
								Name: "oci-dockerconfig",
							},
							Key: ".dockerconfigjson",
						},
					},
				},
			},
		}

		err := ValidateUserInput(job)
		assert.Error(t, err, "Should reject conflicting OCI auth configuration")
		// Use the same or a substring of the error message used in ValidateOCIAuth
		assert.Contains(t, err.Error(), "both username/password and dockerConfigJson")
	})

Place this t.Run("ConflictingOCIAuthConfiguration", ...) block after the existing "ValidOCIConfiguration" subtest (but before the closing } of Test_ValidateUserInputWithOCI).

If the exact error message from ValidateOCIAuth differs (for example, "cannot set both username/password and dockerConfigJson"), update the string in assert.Contains to match the actual error message so the test verifies that ValidateUserInput properly propagates that validation error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
Dockerfile.lmes-job (1)

6-7: Good: Skopeo installation supports OCI upload functionality.

The skopeo installation is correctly placed in the root phase and includes proper cleanup. The command is appropriate for UBI9 and non-interactive CI/CD execution.

Optional consideration: For reproducibility and supply chain confidence, consider pinning the skopeo version (e.g., skopeo-<version>). Since the PR objectives confirm end-to-end testing was successful, this remains optional.

config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml (3)

279-383: Add schema-level auth rules (CEL) to prevent misconfiguration.

Enforce that username/password are provided together, and not mixed with dockerConfigJson; allow anonymous when none provided.

                   oci:
                     description: Upload results to OCI registry
                     properties:
                       ...
                     required:
                     - registry
                     - repository
                     type: object
+                    x-kubernetes-validations:
+                    # username and password must appear together
+                    - rule: "has(self.username) == has(self.password)"
+                      message: "username and password must be set together (or both omitted)."
+                    # dockerConfigJson must not be combined with username/password
+                    - rule: "!(has(self.dockerConfigJson) && (has(self.username) || has(self.password)))"
+                      message: "dockerConfigJson cannot be combined with username/password."

323-342: TLS ergonomics: clarify registry field; default verifySSL; add optional caBundle.

  • “Registry URL” is actually a host (no scheme). Clarify to reduce confusion.
  • Default verifySSL to true (secure-by-default).
  • Provide caBundle SecretKeySelector (mirrors offline.s3.caBundle) for private CAs.
-                      registry:
-                        description: Registry URL (e.g., quay.io, registry.redhat.com)
+                      registry:
+                        description: Registry host (no scheme), e.g., quay.io, registry.redhat.com
                         properties:
                           key:
                             description: The key of the secret to select from.  Must
                               be a valid secret key.
                             type: string
                           name:
                             description: |-
                               Name of the referent.
                               More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
                             type: string
                           optional:
                             description: Specify whether the Secret or its key must
                               be defined
                             type: boolean
                         required:
                         - key
                         type: object
                         x-kubernetes-map-type: atomic
+                      caBundle:
+                        description: PEM-encoded CA bundle for the registry TLS chain (optional; use when verifySSL is true with private CAs)
+                        properties:
+                          key:
+                            description: The key of the secret to select from. Must be a valid secret key.
+                            type: string
+                          name:
+                            description: |-
+                              Name of the referent.
+                              More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
+                            type: string
+                          optional:
+                            description: Specify whether the Secret or its key must be defined
+                            type: boolean
+                        required:
+                        - key
+                        type: object
+                        x-kubernetes-map-type: atomic
                       verifySSL:
-                        description: Whether to verify SSL certificates
-                        type: boolean
+                        description: Whether to verify SSL certificates
+                        type: boolean
+                        default: true

Also applies to: 377-379


343-347: Correct the repository regex pattern to match Docker Distribution spec (remove redundant nesting).

The suggested pattern contains extra parentheses. The canonical Docker Distribution spec regex for repository names is: (?:[a-z0-9]+(?:[._-][a-z0-9]+)*/)*[a-z0-9]+(?:[._-][a-z0-9]+)*, which correctly enforces lowercase-only components separated by slashes, with each component allowing dots, underscores, or hyphens as separators between alphanumeric segments.

Apply:

-                      repository:
-                        description: Repository path (e.g., myorg/evaluation-results)
-                        pattern: ^[a-zA-Z0-9._/-]*$
-                        type: string
+                      repository:
+                        description: Repository path (lowercase), e.g., myorg/evaluation-results
+                        minLength: 1
+                        pattern: ^(?:[a-z0-9]+(?:[._-][a-z0-9]+)*/)*[a-z0-9]+(?:[._-][a-z0-9]+)*$
+                        type: string

Per Docker Distribution spec, repository names must use lowercase alphanumerics with optional dots/underscores/hyphens as segment separators, slash-delimited components, and cannot be empty.

controllers/lmes/driver/driver.go (1)

70-70: OCI upload wiring looks correct; consider small robustness & clarity tweaks

The UploadToOCI flag and uploadToOCI() flow are consistent with how updateCompleteStatus handles post-processing failures (propagating the error and marking the job as failed), and the tests in driver_test.go clearly exercise those cases.

Two minor points you may want to address:

  • uploadToOCI assumes d.Option is non-nil; that’s guaranteed by NewDriver, but adding a defensive guard keeps the helper safer for any future direct uses:
-func (d *driverImpl) uploadToOCI() error {
-	if d == nil || !d.Option.UploadToOCI {
+func (d *driverImpl) uploadToOCI() error {
+	if d == nil || d.Option == nil || !d.Option.UploadToOCI {
 		return nil
 	}
  • You build cmd := []string{...} only for debug printing and then re-specify the same arguments in exec.Command. To avoid divergence, consider reusing cmd:
-	cmd := []string{"python", "/opt/app-root/src/scripts/oci.py", registryFromEnv, resultsLocation}
-	fmt.Printf("[DEBUG] OCI upload CLI: %v\n", cmd)
+	cmd := []string{"python", "/opt/app-root/src/scripts/oci.py", registryFromEnv, resultsLocation}
+	fmt.Printf("[DEBUG] OCI upload CLI: %v\n", cmd)-	output, err := exec.Command(
-		"python",
-		"/opt/app-root/src/scripts/oci.py",
-		registryFromEnv,
-		resultsLocation,
-	).Output()
+	output, err := exec.Command(cmd[0], cmd[1], cmd[2], cmd[3]).Output()

These are non-blocking but improve maintainability and defensive behavior.

Also applies to: 246-302, 506-509

controllers/lmes/lmevaljob_controller_validation_test.go (1)

1493-1798: Good OCI validation coverage; align with actual ValidateUserInput behavior

The new tests around HasOCIOutput/HasOCI, OCISpec helper methods, ValidateOCIPath, ValidateOCIAuth, and ValidateUserInput with OCI are well-structured and exercise the happy and unhappy paths.

One thing to keep in mind: ValidateUserInput currently only calls ValidateOCIAuth when HasOCIOutput() is true; ValidateOCIPath is tested in isolation but not wired into the main validation flow yet. Either:

  • Intentionally keep OCI paths internal (e.g., fixed “results” subpath) and update comments mentioning “path and authentication” to avoid confusion, or
  • When you later introduce a user-configurable OCI path field, remember to hook ValidateOCIPath into ValidateUserInput alongside ValidateOCIAuth.

Given the tests already pin error substrings, any future change to the wired behavior should re-run these tests carefully.

controllers/lmes/validation.go (1)

115-120: OCI validation logic is sound; tighten path handling and keep comments in sync

The OCI validation pieces are generally in good shape:

  • ValidateOCIAuth correctly enforces “exactly one of username/password or dockerConfigJson” and ensures both username and password are present when that method is chosen.
  • ValidateUserInput sensibly gates OCI auth validation behind job.Spec.HasOCIOutput() so Outputs/OCISpec can’t be nil.

Two refinements to consider:

  1. Comment vs behavior in ValidateUserInput

    The comment says “Validate OCI path and authentication”, but only ValidateOCIAuth is called:

    // Validate OCI path and authentication
    if job.Spec.HasOCIOutput() {
        if err := ValidateOCIAuth(job.Spec.Outputs.OCISpec); err != nil {
            return fmt.Errorf("invalid OCI authentication: %w", err)
        }
    }

    Until you actually validate a user-supplied OCI path (or add such a field), either update the comment or add the appropriate ValidateOCIPath(...) call where that path will live.

  2. Stricter ValidateOCIPath to avoid escaping the output root

    ValidateOCIPath currently rejects traversal sequences and shell metacharacters, but an absolute path like /tmp/foo will pass the regex and could defeat the usual filepath.Join(base, path) protections by discarding the base.

    You can cheaply harden this by rejecting absolute paths:

    func ValidateOCIPath(path string) error {
        if path == "" {
            return nil // Empty path is valid for root
        }
    
  •  // Disallow absolute paths to ensure we always stay under the configured output root
    
  •  if strings.HasPrefix(path, "/") {
    
  •      return fmt.Errorf("OCI path must be relative")
    
  •  }
    
    
    This keeps the validator aligned with the intended “subdirectory under output root” usage if/when OCI paths become configurable.  
    
    
    
    
    

Also applies to: 540-592

controllers/lmes/driver/driver_test.go (1)

447-586: OCI upload tests match driver behavior; harden env setup to avoid flakiness

The new OCI-focused tests exercise the key paths in uploadToOCI() and the UploadToOCI flag nicely (disabled, missing registry, and script failure cases), and they align with the implementation in driver.go.

One small robustness issue: in Test_OCIUploadToOCIFunction, the "missing registry env" case assumes OCI_REGISTRY is not set in the process environment:

{
    name:           "missing registry env",
    uploadToOCI:    true,
    registryEnv:    "",
    expectError:    true,
    expectedErrMsg: "OCI_REGISTRY environment variable not set",
},
…
if tt.registryEnv != "" {
    os.Setenv("OCI_REGISTRY", tt.registryEnv)
    defer os.Unsetenv("OCI_REGISTRY")
}

If OCI_REGISTRY happens to be set externally when tests run, this case will unexpectedly see a non-empty registry and exercise the “script execution fails” path instead.

To make the tests deterministic, explicitly clear the env at the start of each subtest (or use t.Setenv if you’re on Go 1.17+):

for _, tt := range tests {
    t.Run(tt.name, func(t *testing.T) {
-       // Set up environment
-       if tt.registryEnv != "" {
-           os.Setenv("OCI_REGISTRY", tt.registryEnv)
-           defer os.Unsetenv("OCI_REGISTRY")
-       }
+       // Ensure a clean baseline for OCI_REGISTRY
+       os.Unsetenv("OCI_REGISTRY")
+       if tt.registryEnv != "" {
+           os.Setenv("OCI_REGISTRY", tt.registryEnv)
+           defer os.Unsetenv("OCI_REGISTRY")
+       }
        …
    })
}

This keeps the “missing registry env” test stable regardless of the running environment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5667eff and fa8b8de.

📒 Files selected for processing (12)
  • .github/workflows/build-and-push.yaml (1 hunks)
  • Dockerfile.lmes-job (1 hunks)
  • api/lmes/v1alpha1/lmevaljob_types.go (4 hunks)
  • api/lmes/v1alpha1/zz_generated.deepcopy.go (2 hunks)
  • cmd/lmes_driver/main.go (2 hunks)
  • config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml (1 hunks)
  • controllers/lmes/driver/driver.go (3 hunks)
  • controllers/lmes/driver/driver_test.go (1 hunks)
  • controllers/lmes/lmevaljob_controller.go (6 hunks)
  • controllers/lmes/lmevaljob_controller_test.go (1 hunks)
  • controllers/lmes/lmevaljob_controller_validation_test.go (2 hunks)
  • controllers/lmes/validation.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
controllers/lmes/validation.go (1)
api/lmes/v1alpha1/lmevaljob_types.go (2)
  • Outputs (396-406)
  • OCISpec (469-496)
controllers/lmes/driver/driver_test.go (3)
controllers/lmes/driver/driver.go (2)
  • NewDriver (117-141)
  • DriverOption (58-77)
cmd/lmes_driver/main.go (1)
  • OutputPath (37-37)
controllers/lmes/constants.go (1)
  • OutputPath (28-28)
controllers/lmes/lmevaljob_controller_validation_test.go (1)
controllers/lmes/validation.go (3)
  • ValidateOCIPath (541-565)
  • ValidateOCIAuth (568-592)
  • ValidateUserInput (76-153)
api/lmes/v1alpha1/zz_generated.deepcopy.go (1)
api/lmes/v1alpha1/lmevaljob_types.go (1)
  • OCISpec (469-496)
controllers/lmes/lmevaljob_controller_test.go (1)
controllers/lmes/lmevaljob_controller.go (1)
  • CreatePod (890-1335)
controllers/lmes/driver/driver.go (2)
cmd/lmes_driver/main.go (1)
  • OutputPath (37-37)
controllers/lmes/constants.go (1)
  • OutputPath (28-28)
controllers/lmes/lmevaljob_controller.go (2)
controllers/lmes/config.go (1)
  • Options (37-48)
api/lmes/v1alpha1/lmevaljob_types.go (2)
  • Outputs (396-406)
  • OCISpec (469-496)
🪛 GitHub Actions: Controller Tests
controllers/lmes/lmevaljob_controller_test.go

[error] 3979-3979: OCI_PATH env var missing in OCI pod configuration.


[error] 3989-3989: OCI_PATH value is empty; expected 'results'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Sourcery review
  • GitHub Check: Sourcery review
🔇 Additional comments (12)
.github/workflows/build-and-push.yaml (1)

174-176: Approved: diagnostic logging for CI manifest validation.

These additions help validate that the sed substitutions for image references were applied correctly before pushing the CI manifests. The timing—after modifications, before the repository push—makes sense for catching configuration issues early.

Assuming these params.env files contain only image repository references and paths (not credentials or API keys), this is a low-risk observability enhancement.

api/lmes/v1alpha1/zz_generated.deepcopy.go (2)

436-470: LGTM! Auto-generated deepcopy implementation is correct.

The generated deepcopy methods properly handle all OCISpec fields:

  • Scalar fields (Repository, Tag, Subject) are copied via the shallow copy on line 438
  • The nested Registry struct is deep-copied on line 439
  • Optional pointer fields (UsernameRef, PasswordRef, DockerConfigJsonRef, VerifySSL) are deep-copied with appropriate nil checks

The implementation follows the standard pattern for Kubernetes deepcopy generation.


556-560: LGTM! Outputs deepcopy correctly handles the OCISpec field.

The extension properly checks for nil, allocates a new OCISpec instance, and delegates to the OCISpec.DeepCopyInto method.

cmd/lmes_driver/main.go (2)

63-63: LGTM! Flag definition follows existing patterns.

The upload-to-oci flag is correctly defined as a bool pointer with a sensible default value of false, consistent with other flags in the file (e.g., download-assets-s3 on line 62).


133-133: LGTM! Flag value properly propagated to driver configuration.

The uploadToOCI flag value is correctly passed to the DriverOption struct, enabling the OCI upload feature when the flag is set.

config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml (1)

279-383: Nice addition — OCI outputs look consistent with the API.

Structure mirrors other SecretKeySelector usages and fits the new feature. Once the above validation/pattern tweaks are in, I’m good.

If a webhook/defaulting layer already enforces these rules, confirm and we can keep the CRD lenient.

api/lmes/v1alpha1/lmevaljob_types.go (1)

396-406: OCISpec and helper methods are well-shaped and align with controller/validation usage

The new OCI API surface in v1alpha1 looks solid:

  • Outputs gaining OCISpec *OCISpec 'json:"oci,omitempty"' cleanly extends the schema without breaking existing PVC-based outputs.
  • OCISpec’s fields and kubebuilder patterns match the usage in the controller (env var generation) and tests (tag/subject syntax, auth selectors, VerifySSL).
  • Helper methods (LMEvalJobSpec.HasOCIOutput, OCISpec.HasUsernamePassword, OCISpec.HasDockerConfigJson, and Outputs.HasOCI) are simple and used correctly in validation and pod construction to avoid nil dereferences.

I don’t see any structural or compatibility concerns here; this is a good API addition.

Also applies to: 469-496, 620-623, 628-635, 651-654

controllers/lmes/lmevaljob_controller.go (5)

279-290: LGTM!

Adding logging after configmap parsing and DSC configuration application aids debugging. The logged Options struct contains operational settings (images, intervals, batch sizes, ports) without sensitive data.


914-920: LGTM!

The refactored condition correctly separates PVC-based output handling from OCI output handling. Only mounting the outputs volume when a PVC is actually configured (managed or existing) is the right approach since OCI uploads use environment variables and a separate docker-config volume instead.

Also applies to: 930-949


1534-1536: LGTM!

The --upload-to-oci flag is appropriately added to the driver command when OCI output is configured. The driver will use the OCI-related environment variables (OCI_REGISTRY, OCI_REPOSITORY, OCI_TAG, etc.) that are set earlier in CreatePod.


1207-1222: The guard is sufficient—HasUsernamePassword() validates both pointers are non-nil.

The implementation confirms that HasUsernamePassword() returns true only when both UsernameRef and PasswordRef are non-nil. The code is safe from nil pointer dereference.


1223-1247: I'll verify the OCI upload script to determine if the DOCKER_CONFIG environment variable is actually needed for docker config authentication.
<function_calls>


#!/bin/bash

Find driver and OCI upload related files

find . -type f -name "driver" -o -name "upload" -o -name "oci" | grep -E ".(sh|py|go)$" | head -20


</function_calls>

Let me search more broadly in the codebase:
<function_calls>


#!/bin/bash

Search for the driver directory structure

find . -type d -name "driver" | head -5


</function_calls>

Let me look for references to the upload script in the controller file:
<function_calls>


#!/bin/bash

Search for upload-to-oci references in the controller

rg -n "upload.*oci|upload-to-oci|driver" controllers/lmes/lmevaljob_controller.go | head -20


</function_calls>

Let me get a broader view of the directory structure:
<function_calls>


#!/bin/bash

List top-level and relevant subdirectories

ls -la controllers/lmes/ 2>/dev/null | head -20
find controllers/lmes -type f -name "*.go" | head -10


</function_calls>

Let me search for how the docker config is used in the codebase:
<function_calls>


#!/bin/bash

Search for DOCKER_CONFIG or docker config usage patterns

rg -n "DOCKER_CONFIG|docker.*config|.docker" --type go | head -30


</function_calls>

Let me examine the full context around the docker config mounting code:
<function_calls>


#!/bin/bash

Get more context around the code section

sed -n '1200,1260p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me look for where environment variables are set for OCI operations:
<function_calls>


#!/bin/bash

Search for ociEnvVars or environment variable setup

rg -n "ociEnvVars|Env.*append" controllers/lmes/lmevaljob_controller.go -A 2 -B 2


</function_calls>

Let me search for any scripts or tools that handle OCI operations:
<function_calls>


#!/bin/bash

Look for shell scripts that might be the OCI driver

find . -type f ( -name "*.sh" -o -name "driver" ) | grep -v node_modules | head -20


</function_calls>

Let me check what files reference the docker-config volume mount:
<function_calls>


#!/bin/bash

Search for references to docker-config

rg -n "docker-config" --type go


</function_calls>

Let me look at the broader file structure to understand the codebase better:
<function_calls>


#!/bin/bash

Get file size and structure

wc -l controllers/lmes/lmevaljob_controller.go

Look at the entire section where OCI handling happens

sed -n '1160,1270p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for how the OCI spec container command is built:
<function_calls>


#!/bin/bash

Search for container commands and how they're executed

rg -n "Command|Args|--upload-to-oci" controllers/lmes/lmevaljob_controller.go -B 2 -A 2


</function_calls>

Let me search for what calls this controller function and understand the full flow:
<function_calls>


#!/bin/bash

Find the function definition that contains this code

sed -n '1100,1150p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me look further back to understand the full context:
<function_calls>


#!/bin/bash

Search for function definition context

rg -n "func.(.).*{" controllers/lmes/lmevaljob_controller.go | tail -20


</function_calls>

Let me look at a wider section to find the function name:
<function_calls>


#!/bin/bash

Get lines 1000-1050 to see function definition

sed -n '1000,1050p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me find environment variable setup in this file:
<function_calls>


#!/bin/bash

Search for env setup patterns

rg -n "EnvVar{|EnvVar\s*{" controllers/lmes/lmevaljob_controller.go -A 3 | head -50


</function_calls>

Let me search more specifically for how authentication environment variables are set:
<function_calls>


#!/bin/bash

Search around the docker config auth section

sed -n '1150,1300p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for whether DOCKER_CONFIG is used elsewhere in the repo:
<function_calls>


#!/bin/bash

Search for DOCKER_CONFIG usage anywhere

rg "DOCKER_CONFIG" --type go


</function_calls>

Let me look for documentation or comments about OCI authentication:
<function_calls>


#!/bin/bash

Search for OCI auth related comments

rg -n "OCI|auth|docker.*config" controllers/lmes/lmevaljob_controller.go -i -B 1 -A 1 | head -80


</function_calls>

Let me check if there are any references to standard docker authentication mechanisms:
<function_calls>


#!/bin/bash

Look for registry authentication or credential handling

rg -n "username|password|auth|credential" controllers/lmes/lmevaljob_controller.go -i | head -20


</function_calls>

Now let me search for what tools/scripts might be consuming these environment variables:
<function_calls>


#!/bin/bash

Search all files for references to the mount path or docker config

rg -n "/tmp/.docker" --type go


</function_calls>

Let me search for scripts or tools that might be doing OCI operations:
<function_calls>


#!/bin/bash

Find all shell scripts in the repo

fd -e sh -e bash | head -20


</function_calls>

Let me search for references to common OCI tools:
<function_calls>


#!/bin/bash

Search for oras, crane, skopeo references

rg -n "oras|crane|skopeo|oras push|oras pull" --type go --type sh


</function_calls>

Let me check what's in the Outputs.OCISpec structure:
<function_calls>


#!/bin/bash

Search for OCISpec type definition

rg -n "type.*OCISpec|struct.*OCISpec" --type go -A 15


</function_calls>

Let me look at the surrounding code to understand what container is being built:
<function_calls>


#!/bin/bash

Get the section from 1050 to 1100 to see container setup

sed -n '1050,1100p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me find where the container spec is actually built:
<function_calls>


#!/bin/bash

Search for Container or PodSpec setup

rg -n "Container{|PodSpec{|.Spec..*Container" controllers/lmes/lmevaljob_controller.go | head -10


</function_calls>

Let me get a comprehensive view of the section with proper context:
<function_calls>


#!/bin/bash

Get comprehensive section with line numbers

sed -n '1220,1250p' controllers/lmes/lmevaljob_controller.go | cat -n


</function_calls>

Let me search for how these volumes and volumeMounts are used:
<function_calls>


#!/bin/bash

Search for where volumeMounts is assigned to a container

rg -n "volumeMounts.=|.VolumeMounts.=" controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for the actual container being configured:
<function_calls>


#!/bin/bash

Search for container definition around the volume code

sed -n '1270,1400p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for what script or tool is being called:
<function_calls>


#!/bin/bash

Search for Command or Args that might call the upload script

sed -n '1350,1450p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me find how this function ends to see the full picture:
<function_calls>


#!/bin/bash

Get the line count and search for function boundaries

tail -50 controllers/lmes/lmevaljob_controller.go | head -30

Also find where this function returns

rg -n "return" controllers/lmes/lmevaljob_controller.go | tail -10


</function_calls>

Let me search more broadly for how OCI upload is invoked:
<function_calls>


#!/bin/bash

Search for --upload-to-oci or similar flags that might indicate script invocation

rg -n "upload.*oci|--upload" --type go --type sh


</function_calls>

Let me check if there's a setup.py or requirements file that might indicate what tools are used:
<function_calls>


#!/bin/bash

Search for setup files or requirements

find . -name "setup.py" -o -name "requirements.txt" -o -name "Dockerfile" | head -10


</function_calls>

Let me look at the Dockerfile if it exists:
<function_calls>


#!/bin/bash

Find and read relevant Dockerfiles

find . -name "Dockerfile*" -type f | head -5


</function_calls>

Let me check if there's documentation on how OCI upload works:
<function_calls>


#!/bin/bash

Search for README or docs

find . -name "README*" -o -name "*.md" | grep -i oci


</function_calls>

Let me try a different approach - check the Spec structure to understand what's available:
<function_calls>


#!/bin/bash

Look for test files that might show how OCI is used

find . -name "test.go" -type f | xargs grep -l "OCISpec|docker-config" 2>/dev/null | head -5


</function_calls>

Let me search in a wider context for how the container command is built in this controller:
<function_calls>


#!/bin/bash

Get the full file to understand the structure

wc -l controllers/lmes/lmevaljob_controller.go

Search for where containers are defined

rg -n "containers.:=|containers.=" controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me look at broader sections to find the container creation:
<function_calls>


#!/bin/bash

Get sections to find container setup

sed -n '1400,1500p' controllers/lmes/lmevaljob_controller.go | cat -n


</function_calls>

Let me try to find the complete context of the function being reviewed:
<function_calls>


#!/bin/bash

Search for function definition around line 1223

sed -n '900,1000p' controllers/lmes/lmevaljob_controller.go | grep -n "func"


</function_calls>

Let me search for the function signature:
<function_calls>


#!/bin/bash

Search backwards from line 1223 for function definition

sed -n '1,1223p' controllers/lmes/lmevaljob_controller.go | tail -30 | head -20


</function_calls>

Let me search for what function this code is in:
<function_calls>


#!/bin/bash

Get context before line 1223 to find function

sed -n '1180,1225p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Now let me look at the code after the docker-config section to see where these volumes get used:
<function_calls>


#!/bin/bash

See what happens after the docker-config setup

sed -n '1240,1350p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for what environment variables are actually being set for the container:
<function_calls>


#!/bin/bash

Search for all EnvVar creations in this file

rg -n "Name:.*DOCKER|Name:.*OCI|Name:.*docker" controllers/lmes/lmevaljob_controller.go -B 1 -A 1


</function_calls>

Let me search for authentication-related environment variables:
<function_calls>


#!/bin/bash

Search more broadly for all EnvVar blocks

sed -n '1100,1300p' controllers/lmes/lmevaljob_controller.go | grep -n "EnvVar|Name:.*Value:" -A 1


</function_calls>

Let me see the complete env var setup section:
<function_calls>


#!/bin/bash

Get a larger section to see all env var setup

sed -n '1150,1350p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Perfect! Now let me search for references to how this container gets invoked and what tool it might use:
<function_calls>


#!/bin/bash

Search for what image or command this container runs

sed -n '1350,1500p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for the image being used:
<function_calls>


#!/bin/bash

Look for Image field in the container

rg -n "Image.:=|.Image.=" controllers/lmes/lmevaljob_controller.go | head -10


</function_calls>

Let me look for the complete container spec definition:
<function_calls>


#!/bin/bash

Get a wider section starting from where volumes end

sed -n '1245,1400p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Now let me search for what tools or commands might need DOCKER_CONFIG:
<function_calls>


#!/bin/bash

Search for common OCI tool usage patterns

rg -n "oras|crane|skopeo|push|pull" --type go -C 2 | head -40


</function_calls>

Let me check if there are any python or shell scripts that handle OCI operations:
<function_calls>


#!/bin/bash

Find all python and shell scripts

find . -type f ( -name ".py" -o -name ".sh" ) | grep -v node_modules | head -15


</function_calls>

Let me search for references to docker or registry authentication in python/shell files:
<function_calls>


#!/bin/bash

Search python/sh files for docker or OCI

find . -type f ( -name ".py" -o -name ".sh" ) -exec grep -l "docker|oras|registry" {} ; 2>/dev/null | head -10


</function_calls>

Let me look for imports or references in the codebase about how docker authentication is typically handled:
<function_calls>


#!/bin/bash

Search for docker config related imports or libraries

rg -n "docker|auth|credential" controllers/lmes/ --type go | grep -i "import|package|from" | head -10


</function_calls>

Let me look at how the DockerConfigJsonRef is used elsewhere:
<function_calls>


#!/bin/bash

Search for DockerConfigJsonRef usage

rg -n "DockerConfigJsonRef" --type go -B 2 -A 2


</function_calls>

Let me check if there are API type definitions that show what these fields are:
<function_calls>


#!/bin/bash

Find API type definitions

find . -name "types.go" -o -name "spec.go" | head -10


</function_calls>

Let me search for where Outputs is defined:
<function_calls>


#!/bin/bash

Search for OCIOutputSpec or similar type definitions

rg -n "type.*Output.*struct|type.*OCI.*struct" --type go -A 10 | head -50


</function_calls>

Let me check if there's a standard way docker config is handled in Kubernetes volumes:
<function_calls>

Kubernetes docker config.json mount path authentication DOCKER_CONFIG environment variable

</function_calls>

Let me also search for how common container tools expect docker config:
<function_calls>

oras crane docker config authentication /tmp/.docker/config.json

</function_calls>

Let me search for standard docker config mounting patterns in the codebase:
<function_calls>


#!/bin/bash

Search for all Secret volume mounts

rg -n "SecretVolumeSource|SecretName" --type go -B 5 -A 5 | head -100


</function_calls>

Let me check if there are other examples of how credentials are passed in this codebase:
<function_calls>


#!/bin/bash

Search for all credential or auth mounting patterns

rg -n "auth|credential|password|username" --type go -B 2 -A 2 controllers/lmes/ | head -80


</function_calls>

Let me check what happens with the username/password authentication to see if environment variables are set there:
<function_calls>


#!/bin/bash

Get the username/password auth section

sed -n '1205,1223p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Now let me check if there's any documentation or examples about docker config setup:
<function_calls>

"DOCKER_CONFIG" environment variable kubernetes pod docker authentication

</function_calls>

Let me search for what tools are actually used for OCI registry push/pull operations in this project:
<function_calls>


#!/bin/bash

Search the entire repo for any OCI/registry operations

rg -n "push|pull|registry|image" controllers/lmes/lmevaljob_controller.go | head -30


</function_calls>

Let me look for the function name and its full implementation:
<function_calls>


#!/bin/bash

Get function signature and broader context

sed -n '1150,1160p' controllers/lmes/lmevaljob_controller.go
sed -n '1000,1020p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me get the function name by looking for the nearest func before line 1223:
<function_calls>


#!/bin/bash

Find the function containing line 1223

awk 'NR<=1223 && /^func/ {funcline=NR; funcname=$0} NR==1223 {print "Function at line " funcline ": " funcname}' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me examine the env variable setup more carefully:
<function_calls>


#!/bin/bash

Get the complete section with env setup, looking for where ociEnvVars is used

sed -n '1160,1260p' controllers/lmes/lmevaljob_controller.go | cat -n


</function_calls>

Let me search for the specific function to understand it better:
<function_calls>


#!/bin/bash

Search for the function definition that contains this code

rg -n "ociEnvVars" controllers/lmes/lmevaljob_controller.go -B 30 | head -40


</function_calls>

Perfect! Now let me see the full initialization of ociEnvVars:
<function_calls>


#!/bin/bash

Get the section where ociEnvVars is first defined and appended to

sed -n '1160,1220p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Now let me find where this ociEnvVars array is assigned to the container:
<function_calls>


#!/bin/bash

Search for where ociEnvVars is used in container spec

rg -n "ociEnvVars" controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me look after the volumes section:
<function_calls>


#!/bin/bash

Get the section after line 1247 where container is set up

sed -n '1247,1280p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for the complete section to see where the container env is assigned:
<function_calls>


#!/bin/bash

Search for Env.*: pattern

rg -n ".Env\s*:" controllers/lmes/lmevaljob_controller.go -A 2 -B 2


</function_calls>

Let me look at the function completion to see how the container is finalized:
<function_calls>


#!/bin/bash

Get the last 100 lines of the file to find where return is

tail -100 controllers/lmes/lmevaljob_controller.go | head -50


</function_calls>

Let me search more specifically:
<function_calls>


#!/bin/bash

Find where Env field is set on a corev1.Container

rg -n "corev1.Container{" controllers/lmes/lmevaljob_controller.go -A 15 | head -40


</function_calls>

Let me search for the container creation more systematically:
<function_calls>


#!/bin/bash

Get container struct definition

sed -n '1250,1350p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me search for where volumeMounts and volumes arrays are actually assigned to the container/pod:
<function_calls>


#!/bin/bash

Find where volumeMounts and volumes are used

rg -n "volumeMounts:|volumes:" controllers/lmes/lmevaljob_controller.go -B 1 -A 1


</function_calls>

Let me look at the complete container/pod spec building section:
<function_calls>


#!/bin/bash

Get a wider context from line 1250 onwards

sed -n '1245,1350p' controllers/lmes/lmevaljob_controller.go | cat -n


</function_calls>

Let me check if there's initialization of volumeMounts and volumes arrays:
<function_calls>


#!/bin/bash

Search for volumeMounts and volumes initialization

rg -n "volumeMounts\s*:=|volumes\s*:=" controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me look at the beginning of this section to find where arrays are initialized:
<function_calls>


#!/bin/bash

Get section from 1160-1180 to see initialization

sed -n '1160,1180p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Now let me find where the container Env field gets set with ociEnvVars:
<function_calls>


#!/bin/bash

Search for Env: ociEnvVars or similar

rg -n "Env:\s*ociEnvVars|Env.*ociEnvVars" controllers/lmes/lmevaljob_controller.go -B 2 -A 2


</function_calls>

Let me look for the complete pattern:
<function_calls>


#!/bin/bash

Get the section where container is being built with Env

sed -n '1255,1350p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me check if ociEnvVars is just being appended to:
<function_calls>


#!/bin/bash

Check complete setup ofociEnvVars again with more lines

rg -n "ociEnvVars\s*:=|ociEnvVars.*append" controllers/lmes/lmevaljob_controller.go -B 1 -A 5 | head -100


</function_calls>

Let me get the exact lines that this volume mount code references:
<function_calls>


#!/bin/bash

Get the complete section from init to return

sed -n '1050,1350p' controllers/lmes/lmevaljob_controller.go | grep -n "func|ociEnvVars|docker-config|volumeMounts|volumes|return" | head -40


</function_calls>

Let me search for what the function returns to understand structure:
<function_calls>


#!/bin/bash

Search for the return statement and what it returns

rg -n "return.*{" controllers/lmes/lmevaljob_controller.go | tail -20


</function_calls>

Let me look for how the pod is constructed:
<function_calls>


#!/bin/bash

Look for PodSpec or Pod assignment

sed -n '1300,1350p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me try to understand the complete flow by looking at large sections:
<function_calls>


#!/bin/bash

Get sections to understand data flow

echo "=== Lines 1160-1170 ===" && sed -n '1160,1170p' controllers/lmes/lmevaljob_controller.go
echo "=== Lines 1220-1250 ===" && sed -n '1220,1250p' controllers/lmes/lmevaljob_controller.go
echo "=== Lines 1300-1320 ===" && sed -n '1300,1320p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Let me now check if DOCKER_CONFIG is set anywhere currently:
<function_calls>


#!/bin/bash

Check if DOCKER_CONFIG is set in the current code

rg -n "DOCKER_CONFIG" controllers/lmes/


</function_calls>

Let me verify what happens with the username/password case to see if env vars are used differently:
<function_calls>


#!/bin/bash

Get the username/password section

sed -n '1200,1224p' controllers/lmes/lmevaljob_controller.go


</function_calls>

Now let me check the web search results about DOCKER_CONFIG usage:
<function_calls>

Docker config authentication volume mount /tmp/.docker/config.json works without DOCKER_CONFIG env var

</function_calls>

Let me check if there's any reference documentation in the repo:
<function_calls>


#!/bin/bash

Search for any docs or README files

find . -name ".md" -o -name "README" | xargs grep -l "docker|OCI|auth" 2>/dev/null | head -10


</function_calls>

Let me look for how container tools typically access docker config:
<function_calls>

"docker login" /tmp/.docker/config.json kubernetes container workdir home directory

</function_calls>

Let me verify by checking how Kubernetes secret volume mounting typically works:
<function_calls>

Kubernetes secret volume mount SubPath MountPath /tmp/.docker/config.json how does it work

</function_calls>

Let me look for any CI/CD files that might show how the container is actually invoked:
<function_calls>


#!/bin/bash

Search for CI/CD or test files that might show actual usage

find . -name ".yaml" -o -name ".yml" -o -name ".github" -type d | head -10


</function_calls>

Let me check if there's a values file or example that shows how this is used:
<function_calls>


#!/bin/bash

Search for example yamls or test cases

find . -name "example" -o -name "test" -o -name "sample" | grep -E ".(yaml|yml|json)$" | head -10


</function_calls>

Let me check the exact Kubernetes behavior with SubPath and MountPath:
<function_calls>

Kubernetes VolumeMount SubPath MountPath file /tmp/.docker/config.json secret key

</function_calls>

Now let me check if the actual criticism in the review is valid by researching common practices:
<function_calls>

oras push crane push DOCKER_CONFIG environment variable .docker/config.json authentication

</function_calls>

Based on my research, let me verify one more important thing about how this should work:
<function_calls>


#!/bin/bash

Let me check if there's any documentation or comments in the code about how this works

rg -n "docker.*config|config.json|authentication" controllers/lmes/lmevaljob_controller.go -B 2 -A 2 | head -60


</function_calls>

Let me also check if there's an init container or setup that might be creating the proper directory structure:
<function_calls>


#!/bin/bash

Search for initContainers or any setup logic

rg -n "InitContainers|init.*container|mkdir|ln -s" controllers/lmes/lmevaljob_controller.go -i


</function_calls>

Let me check if there are any other volume mounts that might shed light on how this works:
<function_calls>


#!/bin/bash

Look for all VolumeMount definitions

rg -n "VolumeMount{" controllers/lmes/lmevaljob_controller.go -A 5 | head -60


</function_calls>

Now, let me check what the web search revealed about whether DOCKER_CONFIG is actually required for common OCI tools:
<function_calls>

does DOCKER_CONFIG environment variable required docker authentication or works with ~/.docker/config.json

</function_calls>

Let me verify by checking the official documentation for standard OCI tools:
<function_calls>

"oras" "docker config" authentication how it finds config file

</function_calls>

Based on my research and code analysis, let me now formulate my conclusions:

Based on my comprehensive analysis of the codebase and web research, I need to provide my conclusion:

Additional investigation needed: Container tool and image not identified in codebase

The review comment suggests adding DOCKER_CONFIG=/tmp/.docker when using dockerConfigJson authentication, citing that OCI tools (oras, crane, skopeo) typically look for Docker config at ~/.docker/config.json by default or respect the DOCKER_CONFIG environment variable.

However, my verification reveals:

  1. Mount Configuration: The code correctly mounts the docker-config secret to /tmp/.docker/config.json using SubPath, which is valid Kubernetes syntax.

  2. No DOCKER_CONFIG Currently Set: The codebase does not currently set a DOCKER_CONFIG environment variable anywhere.

  3. Standard OCI Tool Behavior: Web research confirms that common OCI tools (docker, oras, crane) by default look for config at ~/.docker/config.json OR can be directed via DOCKER_CONFIG environment variable to a specific directory.

  4. Unable to Locate Tool: I could not find what specific tool or script actually consumes this mounted config file in the codebase, which means I cannot definitively confirm whether DOCKER_CONFIG is necessary.

The review's suggestion is reasonable given standard docker/OCI tool behavior, but without identifying the actual container tool that uses this config, I cannot definitively confirm it's required.

Comment on lines 3865 to 4321
// Test OCI controller functionality
func Test_OCICommandGeneration(t *testing.T) {
svcOpts := &serviceOptions{
PodImage: "podimage:latest",
DriverImage: "driver:latest",
ImagePullPolicy: corev1.PullAlways,
MaxBatchSize: 20,
DefaultBatchSize: "4",
}

t.Run("WithOCIOutput", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
},
},
},
}

cmds := generateCmd(svcOpts, job, NewDefaultPermissionConfig())
assert.Contains(t, cmds, "--upload-to-oci", "Should include OCI upload flag")
})

t.Run("WithoutOCIOutput", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
},
}

cmds := generateCmd(svcOpts, job, NewDefaultPermissionConfig())
assert.NotContains(t, cmds, "--upload-to-oci", "Should not include OCI upload flag")
})
}

func Test_OCIPodConfiguration(t *testing.T) {
logger := log.FromContext(context.Background())
svcOpts := &serviceOptions{
PodImage: "podimage:latest",
DriverImage: "driver:latest",
ImagePullPolicy: corev1.PullAlways,
MaxBatchSize: 20,
DefaultBatchSize: "4",
}

t.Run("BasicOCIConfiguration", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
UsernameRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "username",
},
PasswordRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "password",
},
},
},
},
}

// Debug: Check if HasOCIOutput returns true
hasOCI := job.Spec.HasOCIOutput()
assert.True(t, hasOCI, "Job should have OCI output configured")

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify OCI environment variables are set
assert.Contains(t, envMap, "OCI_REGISTRY", "Should have OCI_REGISTRY env var")
assert.Contains(t, envMap, "OCI_REPOSITORY", "Should have OCI_REPOSITORY env var")
assert.Contains(t, envMap, "OCI_PATH", "Should have OCI_PATH env var")
assert.Contains(t, envMap, "OCI_VERIFY_SSL", "Should have OCI_VERIFY_SSL env var")
assert.Contains(t, envMap, "OCI_USERNAME", "Should have OCI_USERNAME env var when auth is configured")
assert.Contains(t, envMap, "OCI_PASSWORD", "Should have OCI_PASSWORD env var when auth is configured")
assert.NotContains(t, envMap, "OCI_TOKEN", "Should not have OCI_TOKEN env var without token auth")

// Verify environment variable sources
assert.Equal(t, "oci-secret", envMap["OCI_REGISTRY"].ValueFrom.SecretKeyRef.Name)
assert.Equal(t, "registry", envMap["OCI_REGISTRY"].ValueFrom.SecretKeyRef.Key)
assert.Equal(t, "myorg/evaluation-results", envMap["OCI_REPOSITORY"].Value)
assert.Equal(t, "results", envMap["OCI_PATH"].Value)
assert.Equal(t, "true", envMap["OCI_VERIFY_SSL"].Value)
assert.Equal(t, "oci-secret", envMap["OCI_USERNAME"].ValueFrom.SecretKeyRef.Name)
assert.Equal(t, "username", envMap["OCI_USERNAME"].ValueFrom.SecretKeyRef.Key)
assert.Equal(t, "oci-secret", envMap["OCI_PASSWORD"].ValueFrom.SecretKeyRef.Name)
assert.Equal(t, "password", envMap["OCI_PASSWORD"].ValueFrom.SecretKeyRef.Key)
})

t.Run("OCIWithDockerConfigJson", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-dockerconfig",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify
assert.NotContains(t, envMap, "OCI_USERNAME", "Should not have OCI_USERNAME env var")
assert.NotContains(t, envMap, "OCI_PASSWORD", "Should not have OCI_PASSWORD env var")

// Verify volume mount
volumeMounts := pod.Spec.Containers[0].VolumeMounts
var dockerConfigMount *corev1.VolumeMount
for i := range volumeMounts {
if volumeMounts[i].Name == "docker-config" {
dockerConfigMount = &volumeMounts[i]
break
}
}
assert.NotNil(t, dockerConfigMount, "Should have docker-config volume mount")
assert.Equal(t, "/tmp/.docker/config.json", dockerConfigMount.MountPath)
assert.Equal(t, "config.json", dockerConfigMount.SubPath)
assert.True(t, dockerConfigMount.ReadOnly)

// Verify volume source
volumes := pod.Spec.Volumes
var dockerConfigVolume *corev1.Volume
for i := range volumes {
if volumes[i].Name == "docker-config" {
dockerConfigVolume = &volumes[i]
break
}
}
assert.NotNil(t, dockerConfigVolume, "Should have docker-config volume")
assert.NotNil(t, dockerConfigVolume.VolumeSource.Secret)
assert.Equal(t, "oci-docker", dockerConfigVolume.VolumeSource.Secret.SecretName)
assert.Equal(t, ".dockerconfigjson", dockerConfigVolume.VolumeSource.Secret.Items[0].Key)
assert.Equal(t, "config.json", dockerConfigVolume.VolumeSource.Secret.Items[0].Path)
})

t.Run("OCIWithCustomTag", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-tag",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
Tag: "custom-tag-v1.0",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify custom tag is set
assert.Contains(t, envMap, "OCI_TAG", "Should have OCI_TAG env var")
assert.Equal(t, "custom-tag-v1.0", envMap["OCI_TAG"].Value)
})

t.Run("OCIWithSubject", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-subject",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
Subject: "llama-2-7b-chat",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify subject is set
assert.Contains(t, envMap, "OCI_SUBJECT", "Should have OCI_SUBJECT env var")
assert.Equal(t, "llama-2-7b-chat", envMap["OCI_SUBJECT"].Value)
})

t.Run("OCIWithSubjectOmitted", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-subject-omitted",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify OCI_SUBJECT is not set when omitted
assert.NotContains(t, envMap, "OCI_SUBJECT", "Should not have OCI_SUBJECT env var when omitted")
})

t.Run("OCIWithSubjectEmpty", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-subject-empty",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
Subject: "",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify OCI_SUBJECT is not set when empty
assert.NotContains(t, envMap, "OCI_SUBJECT", "Should not have OCI_SUBJECT env var when empty")
})

t.Run("OCIWithSubjectSpecialChars", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-subject-special",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
Subject: "valid-subject-123",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify subject with valid special characters is set
assert.Contains(t, envMap, "OCI_SUBJECT", "Should have OCI_SUBJECT env var")
assert.Equal(t, "valid-subject-123", envMap["OCI_SUBJECT"].Value)
})

t.Run("OCIWithSubjectDigestFormat", func(t *testing.T) {
job := &lmesv1alpha1.LMEvalJob{
ObjectMeta: v1.ObjectMeta{
Name: "test-oci-subject-digest",
Namespace: "default",
},
Spec: lmesv1alpha1.LMEvalJobSpec{
Model: "test",
TaskList: lmesv1alpha1.TaskList{
TaskNames: []string{"task1"},
},
Outputs: &lmesv1alpha1.Outputs{
OCISpec: &lmesv1alpha1.OCISpec{
Registry: corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
Key: "registry",
},
Repository: "myorg/evaluation-results",
Subject: "sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef12345678",
DockerConfigJsonRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: "oci-docker"},
Key: ".dockerconfigjson",
},
},
},
},
}

pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
assert.NotNil(t, pod, "Should generate pod successfully")

// Check environment variables
envVars := pod.Spec.Containers[0].Env
envMap := make(map[string]corev1.EnvVar)
for _, env := range envVars {
envMap[env.Name] = env
}

// Verify subject with OCI digest format is set
assert.Contains(t, envMap, "OCI_SUBJECT", "Should have OCI_SUBJECT env var")
assert.Equal(t, "sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef12345678", envMap["OCI_SUBJECT"].Value)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

I'll verify the review comment by inspecting the actual controller implementation and comparing it against the test expectations. Let me start by exploring the codebase.
<function_calls>

find . -type f -name "*lmevaljob_controller.go" | head -5

</function_calls>
<function_calls>

wc -l controllers/lmes/lmevaljob_controller.go

</function_calls>
<function_calls>

rg -n "func CreatePod" controllers/lmes/lmevaljob_controller.go

</function_calls>

Now let me extract the CreatePod function and specifically the OCI-related section:
<function_calls>

rg -n "CreatePod|OCI_REGISTRY|OCI_PATH" controllers/lmes/lmevaljob_controller.go | head -30

</function_calls>

Let me look at the CreatePod function more directly:
<function_calls>

sed -n '/^func CreatePod/,/^func [A-Z]/p' controllers/lmes/lmevaljob_controller.go | head -400

</function_calls>

Let me try a different approach to find where OCI environment variables are set:
<function_calls>

rg -n -A 20 "OCI_REGISTRY" controllers/lmes/lmevaljob_controller.go | head -100

</function_calls>

Let me search for where environment variables are appended in the OCI section:
<function_calls>

rg -n -B 5 -A 15 "OCI_REPOSITORY" controllers/lmes/lmevaljob_controller.go | head -150

</function_calls>

Let me get the line numbers and examine the specific region:
<function_calls>

rg -n "if job.Spec.HasOCIOutput" controllers/lmes/lmevaljob_controller.go

</function_calls>

Let me view the relevant section with context:
<function_calls>

sed -n '1350,1550p' controllers/lmes/lmevaljob_controller.go

</function_calls>

Let me try a broader search:
<function_calls>

rg -n "HasOCIOutput" controllers/lmes/lmevaljob_controller.go

</function_calls>

Let me check if OCI_PATH appears anywhere in the controller:
<function_calls>

rg -n "OCI_PATH" controllers/lmes/lmevaljob_controller.go

</function_calls>

Good! Now let me check the test file to see exactly what it expects:
<function_calls>

rg -n "OCI_PATH" controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Let me view the exact test assertions around line 3989:
<function_calls>

sed -n '3979,3995p' controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Now let me find where the OCI env vars are set in the controller to understand the current implementation:
<function_calls>

rg -n -B 10 -A 30 "OCI_REGISTRY.*SecretKeyRef" controllers/lmes/lmevaljob_controller.go | head -200

</function_calls>

Let me search for the pattern more carefully:
<function_calls>

rg -n -B 5 -A 5 "OCI_VERIFY_SSL" controllers/lmes/lmevaljob_controller.go

</function_calls>

Let me look at the entire OCI env var setup block:
<function_calls>

rg -n "append.*OCI_" controllers/lmes/lmevaljob_controller.go | head -20

</function_calls>

Let me check the exact lines where OCI environment variables are appended:
<function_calls>

sed -n '1240,1350p' controllers/lmes/lmevaljob_controller.go

</function_calls>

Let me get a wider view to understand the structure:
<function_calls>

sed -n '1200,1400p' controllers/lmes/lmevaljob_controller.go

</function_calls>

Let me search for the line containing "OCI_REGISTRY" with more context:
<function_calls>

rg -n -B 2 -A 50 '"OCI_REGISTRY"' controllers/lmes/lmevaljob_controller.go | head -200

</function_calls>

Great! Now I can see the OCI env var setup. Let me extract and view the complete OCI environment variables block more clearly:
<function_calls>

sed -n '1282,1380p' controllers/lmes/lmevaljob_controller.go

</function_calls>

Perfect! Now let me verify if OCI_PATH should be there by checking what the test expects:
<function_calls>

sed -n '3975,4010p' controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Let me also check if there's any reference to how OCI_PATH is used elsewhere:
<function_calls>

rg -n "OCI_PATH" --type go | grep -v test.go

</function_calls>

Now let me check the driver code to understand how OCI_PATH is expected to be used:
<function_calls>

find . -type f -name "*.go" -path "driver" | head -10

</function_calls>

Let me search across all Go files for uploadToOCI:
<function_calls>

rg -n "uploadToOCI" --type go | head -20

</function_calls>

Let me check the pattern of how other similar env vars are set in the test to understand the structure better:
<function_calls>

sed -n '3975,4015p' controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Let me also check to confirm the current implementation doesn't have OCI_PATH by getting the exact code:
<function_calls>

sed -n '1280,1370p' controllers/lmes/lmevaljob_controller.go

</function_calls>

Perfect! Now I have all the information I need. Let me verify the exact lines one more time to ensure accuracy:
<function_calls>

sed -n '1282,1340p' controllers/lmes/lmevaljob_controller.go | cat -n

</function_calls>

Let me also verify the test assertions one more time:
<function_calls>

sed -n '3988,3992p' controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Now let me check if there are any other tests that validate the OCI_PATH behavior:
<function_calls>

rg -n "OCI_PATH" controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Let me view the failing assertions in detail:
<function_calls>

sed -n '3988,3998p' controllers/lmes/lmevaljob_controller_test.go

</function_calls>

Perfect. Now I have confirmed:

Based on my verification of the codebase, I can now provide the rewritten review comment:


OCI controller tests expose missing OCI_PATH environment variable in CreatePod

The new OCI controller tests are well-targeted:

  • Test_OCICommandGeneration correctly asserts that generateCmd includes --upload-to-oci only when Outputs.OCISpec is present.
  • Test_OCIPodConfiguration checks the right things for basic OCI config, dockerconfig.json-based auth, custom tags, and various Subject forms.

However, the tests will fail due to a genuine gap in the implementation. In Test_OCIPodConfiguration/BasicOCIConfiguration (line 3988-3990), the test asserts:

assert.Contains(t, envMap, "OCI_PATH", "Should have OCI_PATH env var")
…
assert.Equal(t, "results", envMap["OCI_PATH"].Value)

Inspecting CreatePod in controllers/lmes/lmevaljob_controller.go (lines 1282–1340), the OCI environment variables block sets OCI_REGISTRY, OCI_REPOSITORY, OCI_TAG, optional OCI_SUBJECT, auth variables, and OCI_VERIFY_SSL, but does not set OCI_PATH. This environment variable is expected by all the OCI pod configuration tests (BasicOCIConfiguration, OCIWithDockerConfigJson, OCIWithCustomTag, OCIWithSubject, etc.).

The fix is to wire OCI_PATH in the CreatePod OCI env-var block. Add the following after OCI_REPOSITORY:

{
    Name:  "OCI_PATH",
    Value: "results",
},

Until this environment variable is set, all OCI pod configuration tests will fail.

🧰 Tools
🪛 GitHub Actions: Controller Tests

[error] 3979-3979: OCI_PATH env var missing in OCI pod configuration.


[error] 3989-3989: OCI_PATH value is empty; expected 'results'.

🤖 Prompt for AI Agents
In controllers/lmes/lmevaljob_controller.go around lines 1282–1340, the
CreatePod OCI environment variable block sets OCI_REGISTRY and OCI_REPOSITORY
but omits OCI_PATH (tests expect OCI_PATH="results"); add an EnvVar entry named
OCI_PATH with Value "results" (placed after OCI_REPOSITORY) in the OCI env var
setup so the pod contains OCI_PATH for the OCI-related tests.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
controllers/lmes/driver/driver_test.go (1)

526-585: Consider testing through the public API instead of internal implementation.

This test directly instantiates driverImpl (line 566) and calls uploadToOCI() (line 573), which appear to be internal implementation details. While this works, it tightly couples the test to the internal structure.

Consider whether these scenarios could be tested through the public Driver interface using NewDriver() and Run(), similar to the other tests in this file. This would make the tests more resilient to internal refactoring while still providing coverage of the OCI upload behavior.

If direct testing of uploadToOCI() is necessary for granular unit testing, this approach is acceptable but be aware it creates coupling to internal implementation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58063af and af6d919.

📒 Files selected for processing (2)
  • controllers/lmes/driver/driver.go (3 hunks)
  • controllers/lmes/driver/driver_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • controllers/lmes/driver/driver.go
🧰 Additional context used
🧬 Code graph analysis (1)
controllers/lmes/driver/driver_test.go (3)
controllers/lmes/driver/driver.go (2)
  • NewDriver (117-141)
  • DriverOption (58-77)
cmd/lmes_driver/main.go (1)
  • OutputPath (37-37)
controllers/lmes/constants.go (1)
  • OutputPath (28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Sourcery review
  • GitHub Check: deploy
  • GitHub Check: build
  • GitHub Check: build-and-push-ci
🔇 Additional comments (2)
controllers/lmes/driver/driver_test.go (2)

477-498: LGTM!

The test correctly validates that the driver completes successfully when OCI upload is disabled. The logic and assertions are appropriate.


500-524: LGTM!

The test correctly validates that enabling OCI upload without setting OCI_REGISTRY results in an appropriate error. The explicit Unsetenv call ensures clean test state.

Signed-off-by: tarilabs <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
controllers/lmes/driver/driver_test.go (1)

447-478: Rename test to reflect the failure scenario being tested.

The test name Test_OCIUploadDefaultFlow suggests a successful flow, but the test actually validates the failure path when the OCI script doesn't exist (line 472 expects error, line 475 asserts failure message). This naming is misleading and can confuse developers.

Consider renaming to Test_OCIUploadFailsWhenScriptMissing:

-func Test_OCIUploadDefaultFlow(t *testing.T) {
-	// this test simulates the OCI upload standard flow of ops;
-	// since testing environment does not contain the script,
-	// below we will check the invocation to have failed at script invocation.
+func Test_OCIUploadFailsWhenScriptMissing(t *testing.T) {
+	// Verify that OCI upload fails gracefully when the upload script is missing.
+	// The test environment intentionally lacks the script to validate error handling.
 	info := setupTest(t, true)
controllers/lmes/lmevaljob_controller_test.go (2)

3865-3919: OCI command generation tests: consider adding coverage for Outputs-only and PVC+OCI combinations

The new Test_OCICommandGeneration subtests look correct and match the intended --upload-to-oci behavior when Outputs.OCISpec is set vs completely absent. To harden this further, you could add two focused cases that were mentioned in an earlier review:

  1. Non-nil Outputs with nil OCISpec – confirm that --upload-to-oci is not added when Outputs exists but OCISpec is nil.

  2. Custom output PVC plus OCI output – confirm that --upload-to-oci is still present when both a PVC output and an OCI output are configured, ensuring the OCI flag depends solely on HasOCIOutput() and not on other output settings.

Example sketch (adapt struct/field names as needed):

 func Test_OCICommandGeneration(t *testing.T) {
   svcOpts := &serviceOptions{
@@
-  t.Run("WithOCIOutput", func(t *testing.T) {
+  t.Run("WithOCIOutput", func(t *testing.T) {
     // existing test...
   })
 
+  t.Run("NonNilOutputsNilOCISpec", func(t *testing.T) {
+    job := &lmesv1alpha1.LMEvalJob{
+      Spec: lmesv1alpha1.LMEvalJobSpec{
+        Model: "test",
+        TaskList: lmesv1alpha1.TaskList{
+          TaskNames: []string{"task1"},
+        },
+        Outputs: &lmesv1alpha1.Outputs{
+          // OCISpec intentionally nil; other fields optional
+        },
+      },
+    }
+    cmds := generateCmd(svcOpts, job, NewDefaultPermissionConfig())
+    assert.NotContains(t, cmds, "--upload-to-oci", "Outputs without OCISpec should not trigger OCI upload")
+  })
+
+  t.Run("CustomPVCAndOCIOutput", func(t *testing.T) {
+    pvcName := "custom-output-pvc"
+    job := &lmesv1alpha1.LMEvalJob{
+      Spec: lmesv1alpha1.LMEvalJobSpec{
+        Model: "test",
+        TaskList: lmesv1alpha1.TaskList{
+          TaskNames: []string{"task1"},
+        },
+        Outputs: &lmesv1alpha1.Outputs{
+          PersistentVolumeClaimName: &pvcName,
+          OCISpec: &lmesv1alpha1.OCISpec{
+            Repository: "example.com/my/repo",
+          },
+        },
+      },
+    }
+    cmds := generateCmd(svcOpts, job, NewDefaultPermissionConfig())
+    assert.Contains(t, cmds, "--upload-to-oci", "OCI output plus PVC should still trigger OCI upload")
+  })

This would lock in the intended semantics of HasOCIOutput() against future regressions.


3921-4320: OCI pod configuration tests: add VerifySSL=false and username/password vs docker-config auth coverage

The Test_OCIPodConfiguration suite does a good job validating the core OCI wiring (env vars, dockerconfig volume, tag and subject handling). Two small additions would make it more complete, echoing earlier review feedback:

  1. Explicit VerifySSL: false behavior

    Add a subtest where OCISpec.VerifySSL is explicitly set false and assert OCI_VERIFY_SSL == "false" (rather than relying only on the default "true" case):

 func Test_OCIPodConfiguration(t *testing.T) {
@@
   t.Run("BasicOCIConfiguration", func(t *testing.T) {
     // existing test...
   })
+
+  t.Run("VerifySSLFalseOverridesDefault", func(t *testing.T) {
+    job := &lmesv1alpha1.LMEvalJob{
+      ObjectMeta: v1.ObjectMeta{
+        Name:      "test-oci-verifyssl-false",
+        Namespace: "default",
+      },
+      Spec: lmesv1alpha1.LMEvalJobSpec{
+        Model: "test",
+        TaskList: lmesv1alpha1.TaskList{
+          TaskNames: []string{"task1"},
+        },
+        Outputs: &lmesv1alpha1.Outputs{
+          OCISpec: &lmesv1alpha1.OCISpec{
+            Registry: corev1.SecretKeySelector{
+              LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
+              Key:                  "registry",
+            },
+            Repository: "myorg/evaluation-results",
+            VerifySSL:  ptr.To(false),
+          },
+        },
+      },
+    }
+
+    pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
+    require.NotNil(t, pod)
+
+    envMap := map[string]corev1.EnvVar{}
+    for _, env := range pod.Spec.Containers[0].Env {
+      envMap[env.Name] = env
+    }
+    assert.Equal(t, "false", envMap["OCI_VERIFY_SSL"].Value, "VerifySSL=false should set OCI_VERIFY_SSL to \"false\"")
+  })
  1. Username/password auth should exclude docker-config volume

    Add a subtest that configures UsernameRef / PasswordRef without DockerConfigJsonRef and asserts that no "docker-config" volume or volumeMount exists:

   t.Run("OCIWithDockerConfigJson", func(t *testing.T) {
     // existing test...
   })
+
+  t.Run("UsernamePasswordAuthExcludesDockerConfigVolume", func(t *testing.T) {
+    job := &lmesv1alpha1.LMEvalJob{
+      ObjectMeta: v1.ObjectMeta{
+        Name:      "test-oci-username-password",
+        Namespace: "default",
+      },
+      Spec: lmesv1alpha1.LMEvalJobSpec{
+        Model: "test",
+        TaskList: lmesv1alpha1.TaskList{
+          TaskNames: []string{"task1"},
+        },
+        Outputs: &lmesv1alpha1.Outputs{
+          OCISpec: &lmesv1alpha1.OCISpec{
+            Registry: corev1.SecretKeySelector{
+              LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
+              Key:                  "registry",
+            },
+            Repository: "myorg/evaluation-results",
+            UsernameRef: &corev1.SecretKeySelector{
+              LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
+              Key:                  "username",
+            },
+            PasswordRef: &corev1.SecretKeySelector{
+              LocalObjectReference: corev1.LocalObjectReference{Name: "oci-secret"},
+              Key:                  "password",
+            },
+          },
+        },
+      },
+    }
+
+    pod := CreatePod(svcOpts, job, NewDefaultPermissionConfig(), logger)
+    require.NotNil(t, pod)
+
+    for _, v := range pod.Spec.Volumes {
+      assert.NotEqual(t, "docker-config", v.Name, "docker-config volume must not be present for username/password auth")
+    }
+    for _, c := range pod.Spec.Containers {
+      for _, m := range c.VolumeMounts {
+        assert.NotEqual(t, "docker-config", m.Name, "docker-config mount must not be present for username/password auth")
+      }
+    }
+  })

As a minor optional cleanup, you could also factor the repeated “env slice → map[string]corev1.EnvVar” logic into a tiny helper to reduce duplication across the OCI subtests, but that’s purely cosmetic.

🧹 Nitpick comments (1)
controllers/lmes/driver/driver.go (1)

265-291: Eliminate command duplication between logging and execution.

The cmd slice is constructed at line 265 for logging, but the actual exec.Command call at lines 286-291 duplicates the same arguments. This creates a maintenance risk where the logged command could diverge from the executed command if one is updated without the other.

Apply this diff to reuse the cmd slice:

 	cmd := []string{"python", "/opt/app-root/src/scripts/oci.py", registryFromEnv, outputPath}
 	fmt.Printf("[DEBUG] OCI upload CLI: %v\n", cmd)
 
 	// List all files and directories in resultsLocation
 	fmt.Printf("[DEBUG] Contents of results location (%s):\n", outputPath)
 	_ = filepath.Walk(outputPath, func(path string, info os.FileInfo, err error) error {
 		if err != nil {
 			fmt.Printf("  [error] %v\n", err)
 			return nil
 		}
 		rel, _ := filepath.Rel(outputPath, path)
 		if rel == "." {
 			fmt.Printf("  %s/\n", rel)
 		} else if info.IsDir() {
 			fmt.Printf("  %s/\n", rel)
 		} else {
 			fmt.Printf("  %s\n", rel)
 		}
 		return nil
 	})
 
-	output, err := exec.Command(
-		"python",
-		"/opt/app-root/src/scripts/oci.py",
-		registryFromEnv,
-		outputPath,
-	).Output()
+	output, err := exec.Command(cmd[0], cmd[1:]...).Output()
 	fmt.Println(string(output))
 	if err != nil {
 		return fmt.Errorf("failed to upload results to OCI: %v", err)
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af6d919 and be802b0.

📒 Files selected for processing (3)
  • controllers/lmes/driver/driver.go (3 hunks)
  • controllers/lmes/driver/driver_test.go (1 hunks)
  • controllers/lmes/lmevaljob_controller_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
controllers/lmes/driver/driver_test.go (3)
controllers/lmes/driver/driver.go (2)
  • NewDriver (117-141)
  • DriverOption (58-77)
cmd/lmes_driver/main.go (1)
  • OutputPath (37-37)
controllers/lmes/constants.go (1)
  • OutputPath (28-28)
controllers/lmes/driver/driver.go (2)
cmd/lmes_driver/main.go (1)
  • OutputPath (37-37)
controllers/lmes/constants.go (1)
  • OutputPath (28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Sourcery review
  • GitHub Check: deploy
  • GitHub Check: build
  • GitHub Check: build-and-push-ci
🔇 Additional comments (5)
controllers/lmes/driver/driver.go (2)

70-70: LGTM!

The UploadToOCI field addition is clean and consistent with other option flags in the struct.


502-505: LGTM!

The OCI upload integration is correctly placed after results retrieval and properly handles errors by assigning them to err, which triggers the failure block below.

controllers/lmes/driver/driver_test.go (3)

480-501: LGTM!

This test correctly validates that when UploadToOCI is disabled, the job completes successfully without attempting any OCI operations.


503-527: LGTM!

This test properly validates the error handling when OCI_REGISTRY is not set, ensuring users receive a clear error message.


529-588: LGTM!

Excellent table-driven test providing comprehensive coverage of the uploadToOCI() method. The test properly isolates the method and validates all key scenarios: disabled upload, missing registry, and script execution failure.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New security issues found

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

@sourcery-ai[bot]: changing LGTM is restricted to collaborators

In response to this:

New security issues found

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

@tarilabs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 893c736 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-lmes-build Pull requests that need a new build of the LMES driver and job images for CI ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants